perm filename CLVALI.MSG[COM,LSP]7 blob
sn#823733 filedate 1986-08-28 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00002 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 Introduction
C00006 ENDMK
C⊗;
Introduction
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:
CL-Validation@su-ai.arpa
Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:
CLVALI.MSG[COM,LSP]
You can read this file or FTP it away without logging in to SAIL.
To communicate with the moderator, send to the address:
CL-Validation-request@su-ai.arpa
Here is a list of the people who are currently on the mailing list:
Person Affiliation Net Address
Richard Greenblatt LMI "rg%oz"@mc
Scott Fahlman CMU fahlman@cmuc
Eric Schoen Stanford schoen@sumex
Gordon Novak Univ. of Texas novak@utexas-20
Kent Pitman MIT kmp@mc
Dick Gabriel Stanford/Lucid rpg@sail
David Wile ISI Wile@ISI-VAXA
Martin Griss HP griss.hplabs@csnet-relay (I hope)
Walter VanRoggen DEC wvanroggen@dec-marlboro
Richard Zippel MIT rz@mc
Dan Oldman Data General not established
Larry Stabile Apollo not established
Bob Kessler Univ. of Utah kessler@utah-20
Steve Krueger TI krueger.ti-csl@csnet-relay
Carl Hewitt MIT hewitt-validation@mc
Alan Snyder HP snyder.hplabs@csnet-relay
Jerry Barber Gold Hill jerryb@mc
Bob Kerns Symbolics rwk@mc
Don Allen BBN allen@bbnf
David Moon Symbolics moon@scrc-stonybrook
Glenn Burke MIT GSB@mc
Tom Bylander Ohio State bylander@rutgers
Richard Soley MIT Soley@mc
Dan Weinreb Symbolics DLW@scrc-stonybrook
Guy Steele Tartan steele@tl-20a
Jim Meehan Cognitive Sys. meehan@yale
Chris Reisbeck Yale riesbeck@yale
The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.
Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.
∂23-Sep-84 1625 RPG Introduction
To: cl-validation@SU-AI.ARPA
Welcome to the Common Lisp Validation Subgroup.
In order to mail to this group, send to the address:
CL-Validation@su-ai.arpa
Capitalization is not necessary, and if you are directly on the ARPANET,
you can nickname SU-AI.ARPA as SAIL. An archive of messages is kept on
SAIL in the file:
CLVALI.MSG[COM,LSP]
You can read this file or FTP it away without logging in to SAIL.
To communicate with the moderator, send to the address:
CL-Validation-request@su-ai.arpa
Here is a list of the people who are currently on the mailing list:
Person Affiliation Net Address
Richard Greenblatt LMI "rg%oz"@mc
Scott Fahlman CMU fahlman@cmuc
Eric Schoen Stanford schoen@sumex
Gordon Novak Univ. of Texas novak@utexas-20
Kent Pitman MIT kmp@mc
Dick Gabriel Stanford/Lucid rpg@sail
David Wile ISI Wile@ISI-VAXA
Martin Griss HP griss.hplabs@csnet-relay (I hope)
Walter VanRoggen DEC wvanroggen@dec-marlboro
Richard Zippel MIT rz@mc
Dan Oldman Data General not established
Larry Stabile Apollo not established
Bob Kessler Univ. of Utah kessler@utah-20
Steve Krueger TI krueger.ti-csl@csnet-relay
Carl Hewitt MIT hewitt-validation@mc
Alan Snyder HP snyder.hplabs@csnet-relay
Jerry Barber Gold Hill jerryb@mc
Bob Kerns Symbolics rwk@mc
Don Allen BBN allen@bbnf
David Moon Symbolics moon@scrc-stonybrook
Glenn Burke MIT GSB@mc
Tom Bylander Ohio State bylander@rutgers
Richard Soley MIT Soley@mc
Dan Weinreb Symbolics DLW@scrc-stonybrook
Guy Steele Tartan steele@tl-20a
Jim Meehan Cognitive Sys. meehan@yale
Chris Reisbeck Yale riesbeck@yale
The first order of business is for each of us to ask people we know who may
be interested in this subgroup if they would like to be added to this list.
Next, we ought to consider who might wish to be the chairman of this subgroup.
Before this happens, I think we ought to wait until the list is more nearly
complete. For example, there are no representatives of Xerox, and I think we
agree that LOOPS should be studied before we make any decisions.
∂02-Oct-84 1318 RPG Chairman
To: cl-validation@SU-AI.ARPA
Now that we've basically got most everyone who is interested on the mailing
list, let's pick a chairman. I suggest that people volunteer for chairman.
The duties are to keep the discussion going, to gather proposals and review
them, and to otherwise administer the needs of the mailing list. I will
retain the duties of maintaining the list itself and the archives, but
otherwise the chairman will be running the show.
Any takers?
-rpg-
∂05-Oct-84 2349 WHOLEY@CMU-CS-C.ARPA Chairman
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 5 Oct 84 23:49:33 PDT
Received: ID <WHOLEY@CMU-CS-C.ARPA>; Sat 6 Oct 84 02:49:51-EDT
Date: Sat, 6 Oct 1984 02:49 EDT
Message-ID: <WHOLEY.12053193572.BABYL@CMU-CS-C.ARPA>
Sender: WHOLEY@CMU-CS-C.ARPA
From: Skef Wholey <Wholey@CMU-CS-C.ARPA>
To: Cl-Validation@SU-AI.ARPA
CC: Dick Gabriel <RPG@SU-AI.ARPA>
Subject: Chairman
I'd be willing to chair this mailing list.
I've been very much involved in most aspects of the implementation of Spice
Lisp, from the microcode to the compiler and other parts of the system, like
the stream system, pretty printer, and Defstruct. A goal of ours is that Spice
Lisp port easily, so most of the system is written in Common Lisp.
Since our code is now being incorporated into many implementations, it's
crucial that it correctly implement Common Lisp. A problem with our code is
that some of it has existed since before the idea of Common Lisp, and we've
spent many man-months tracking the changes to the Common Lisp specification as
the language evolved. I am sure we've got bugs because I'm sure we've missed
"little" changes between editions of the manual.
So, I'm interested first in developing code that will aid implementors in
discovering pieces of the manual they may have accidentally missed, and second
in verifying that implementation X is "true Common Lisp." I expect that the
body of code used for the first purpose will evolve into a real validation
suite as implementors worry about smaller and smaller details.
I've written little validation suites for a few things, and interested parties
can grab those from <Wholey.Slisp> on CMU-CS-C. Here's what I have right now:
Valid-Var.Slisp Checks to see that all variables and constants
in the CLM are there, and satisfy simple tests
about what their values should be.
Valid-Char.Slisp Exercises the functions in the Characters
chapter of the CLM.
Valid-Symbol.Slisp Exercises the functions in the Symbols chapter
of the CLM.
Some of the tests in the files may seem silly, but they've uncovered a few bugs
in both Spice Lisp and the Symbolics CLCP.
I think more programs that check things out a chapter (or section) at a time
would be quite valuable, and I'm willing to devote some time to coordinating
such programs into a coherent library.
--Skef
∂13-Oct-84 1451 RPG Chairman
To: cl-validation@SU-AI.ARPA
Gary Brown of DEC, Ellen Waldrum of TI, and Skef Wholey of CMU
have volunteered to be chairman of the Validation subgroup. Perhaps
these three people could decide amongst themselves who should be
chairman and let me know by October 24.
-rpg-
∂27-Oct-84 2159 RPG Hello folks
To: cl-validation@SU-AI.ARPA
We now have a chairman of the charter: Bob Kerns of Symbolics. I think
he will make an excellent chairman. For your information I am including
the current members of the mailing list.
I will now let Bob take over responsibility for the discussion.
Dave Matthews HP "hpfclp!validation%hplabs"@csnet-relay
Ken Sinclair LMI "khs%mit-oz"@mit-mc
Gary Brown DEC Brown@dec-hudson
Ellen Waldrum TI WALDRUM.ti-csl@csnet-relay
Skef Wholey CMU Wholey@cmuc
John Foderaro Berkeley jkf@ucbmike.arpa
Cordell Green Kestrel Green@Kestrel
Richard Greenblatt LMI "rg%oz"@mc
Richard Fateman Berekely fateman@berkeley
Scott Fahlman CMU fahlman@cmuc
Eric Schoen Stanford schoen@sumex
Gordon Novak Univ. of Texas novak@utexas-20
Kent Pitman MIT kmp@mc
Dick Gabriel Stanford/Lucid rpg@sail
David Wile ISI Wile@ISI-VAXA
Martin Griss HP griss.hplabs@csnet-relay (I hope)
Walter VanRoggen DEC wvanroggen@dec-marlboro
Richard Zippel MIT rz@mc
Dan Oldman Data General not established
Larry Stabile Apollo not established
Bob Kessler Univ. of Utah kessler@utah-20
Steve Krueger TI krueger.ti-csl@csnet-relay
Carl Hewitt MIT hewitt-Validation@mc
Alan Snyder HP snyder.hplabs@csnet-relay
Jerry Barber Gold Hill jerryb@mc
Bob Kerns Symbolics rwk@mc
Don Allen BBN allen@bbnf
David Moon Symbolics moon@scrc-stonybrook
Glenn Burke MIT GSB@mc
Tom Bylander Ohio State bylander@rutgers
Richard Soley MIT Soley@mc
Dan Weinreb Symbolics DLW@scrc-stonybrook
Guy Steele Tartan steele@tl-20a
Jim Meehan Cognitive Sys. meehan@yale
Chris Reisbeck Yale riesbeck@yale
∂27-Oct-84 2202 RPG Correction
To: cl-validation@SU-AI.ARPA
The last message about Bob Kerns had a typo in it. He is chairman
of the validation subgroup, not the charter subgroup. Now you
know my secret abot sending out these announcements!
-rpg-
∂02-Nov-84 1141 brown@DEC-HUDSON First thoughts on validation
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 2 Nov 84 11:38:53 PST
Date: Fri, 02 Nov 84 14:34:24 EST
From: brown@DEC-HUDSON
Subject: First thoughts on validation
To: cl-validation@su-ai
Cc: brown@dec-hudson
I am Gary Brown and supervise the Lisp Development group at Digital
I haven't seen any mail about validation yet, so this is to get things
started.
I think there are three areas we need to address:
1) The philosophy of validation - What are we going to validate and
what are we explicitly not going to check?
2) The validation process - What kind of mechanism should be used to
implement the validation suite, to maintain it, to update it and
actually validate Common Lisp implementations?
3) Creation of an initial validation suite - I believe we could disband
after reporting on the first two areas, but it would be fun if we
could also create a prototype validation suite. Plus, we probably
can't do a good job specifying the process if we haven't experimented.
Here are my initial thoughts about these three areas:
PHILOSOPHY
We need to clearly state what the validation process is meant to
accomplish and what it is not intended to accomplish. There are
aspects of a system of interest to users which we cannot validate.
For example, language validation should not be concerned with:
- The performance/efficiency of the system under test. There should
no timing tests built into the validation suite.
- The robustness of the system. How it responds to errors, the
usefulness of error messages should not be a consideration
in the design of tests.
- The type of support tools such as debuggers and editors should
not be tested or reported on.
In general, the validation process should report only on whether or
not the implementation is a legal common lisp as defined by the
common lisp reference manual. Any other information derived from
the testing process should not be made public. The testing process
cannot produce information which can be used by vendors as advertisements
for their implementations or to degrade other implementations.
We need to state how we will test language elements which are ill-defined
in the reference manual. For example, if the manual states that it
is "an error" to do something, then we cannot write a test for that
situation. However, if the manual states that an "error is signaled"
then we should verify that.
There are several functions in the language whose action is implementation
dependent. I don't see how we can write a test for INSPECT or for
the printed appearance when *PRINT-PRETTY* is on (however, we can
insure that what is printed is still READable).
PROCESS
We need to describe a process for language validation. We could
have a very informal process where the test programs are publicly
available and potential customers acquire and run the tests. However,
I think we need, at least initially, a more formal process.
A contract should be written (with ARPA money?) to some third party
software house to produce and maintain the validation programs, to
execute the tests, and to report the results. I believe the ADA
validation process works something like this:
- Every six months a "field test" version of the validation suite
suite is produced (and the previous field test version is made the
official version). Interested parties can acquire the programs
run them and comment back to SofTech.
- When a implementation wants to validate, it tells some government
agency, gets the current validation suite, runs it and send all
the output back.
- An appointment is then set up and people the validation agency
come vendor and run all the tests themselves, again bundle up
the output and take it away.
- Several weeks later, the success of the testing is announced.
This seems like a reasonable process to me. We might want to modify
it by:
- Having the same agency that produced the tests, validate their results.
- Getting rid of the on site visit requirement; it's expensive I
think the vendor needs to include a check for $10,000 to when
they request validation. That might be hard for universities
to justify.
Some other things I think need to set up are:
- A good channel from the test producers to the language definers
for quick clarifications and to improve the manual
- Formal ways to complain about the contents of test
- Ways for new tests to be suggested. Customers are sure to
find bugs in validated systems, so it would be invaluable if
they could report these as holes in the test system.
A FIRST CUT
To do a good job defining the validation process, I think we need to
try to produce a prototype test system. At Digital we have already
expended considerable effort writing tests for VAX LISP and I assume that
everyone else implementing Common Lisp done the same. Currently, our
test software is considered proprietary information. However, I believe
that we would be willing to make it public domain if the other vendors
were willing to do the same.
If some kind of informal agreement can be made, we should try to specify
the form of the tests, have everyone convert their applicable tests
to this form and then exchange tests. This will surely generate
a lot of information on how the test system should be put together.
-Gary Brown
∂04-Nov-84 0748 FAHLMAN@CMU-CS-C.ARPA Second thoughts on validation
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 4 Nov 84 07:47:00 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Sun 4 Nov 84 10:47:06-EST
Date: Sun, 4 Nov 1984 10:47 EST
Message-ID: <FAHLMAN.12060893556.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To: cl-validation@SU-AI.ARPA
Subject: Second thoughts on validation
I agree with all of Gary Brown's comments on the proper scope of
validation. The only point that may cause difficulty is the business
about verifying that an error is signalled in all the places where this
is specified. The problem there is that until the Error subgroup does
its thing, we have no portable way to define a Catch-All-Errors handler
so that the valiadtion program can intercept such signals and proceed.
Maybe we had better define such a hook right away and require that any
implementation that wants to be validated has to support this, in
addition to whatever more elegant hierarchical system eventually gets
set up. The lack of such a unversal ERRSET mechanism is clearly a
design flaw in the language. We kept putting this off until we could
figure out what the ultimate error handler would look like, and so far
we haven't done that.
As for the process, I think that the validation suite is naturally going
to be structured as a series of files, each of which contains a function
that will test some particular part of the language: a chapter's worth
or maybe just some piece of a chapter such as lambda-list functionality.
That way, people can write little chunks of validation without being
overwhelmend by the total task. Each such file should have a single
entry point to a master function that runs everything else in the file.
These things should print out an informative message whenever it notices
an implementation error. They can also print out some other commentary
at the implementor's discretion, but probably there should be a switch
that will muzzle anything other than hard errors. Finally, there should
be some global switch that starts out as NIL and gets set to T whenever
some module finds a clear error. If this is still NIL after every
module has done its testing, the implementation is believed to be
correct. I was going to suggest a counter for this, but then we might
get some sales rep saying that Lisp X has 14 validation errors and our
Lisp only has 8. That would be bad, since some errors are MUCH more
important than others.
To get the ball rolling, we could begin collecting public-domain
validation modules in some place that is easily accessible by arpanet.
As these appear, we can informally test various implementations against
them to smoke out any inconsistencies or disagreements about the tests.
I would expect that when this starts, we'll suddenly find that we have a
lot of little questions to answer about the language itself, and we'll
have to do our best to resolve those questions quickly. Once we have
reached a consensus that a test module is correct, we can add it to some
sort of "approved" list, but we should recognize that, initially at
least, the testing module is as likely to be incorrect as the
implementation.
As soon as possible, this process of maintaining and distributing the
validation suite (and filling in any holes that the user community does
not fill voluntarily) should fall to someone with a DARPA contract to do
this. No formal testing should begin until this organization is in
place and until trademark protection has been obtained for "DARPA
Validated Common Lisp" or whatever we are going to call it. But a lot
can be done informally in the meantime.
I don't see a lot of need for expensive site visits to do the
validating. It certainly doesn't have to be a one-shot win-or-lose
process, but can be iterative until all the tests are passed by the same
system, or until the manufacturer decides that it has come as close as
it is going to for the time being. Some trusted (by DARPA), neutral
outside observer needs to verify that the hardware/software system in
question does in fact run the test without any chicanery, but there are
all sorts of ways of setting that up with minimal bureaucratic hassle.
We should probably not be in the business of officially validating
Common Lisps on machines that are still under wraps and are not actually
for sale, but the manufacturers (or potential big customers) could
certainly run the tests for themselves on top-secret prototypes and be
ready for official validation as soon as the machine is released to the
public.
I'm not sure how to break the deadlock in which no manufacturer wants to
be the first to throw his proprietary validation software into the pot.
Maybe this won't be a problem, if one of the less bureaucratic companies
just decides to take the initiative here. But if there is such a
deadlock, I suppose the way to proceed is first to get a list of what
each company proposes to offer, then to Get agreement from each that it
will donate its code if the others do likewise, then to get some lawyer
(sigh!) to draw up an agreement that all this software will be placed in
the public domain on a certain date if all the other companies have
signed the agreement by that date. It would be really nice to avoid
this process, however. I see no advantage at all for a company to have
its own internal validation code, since until that code ahs been
publically scrutinized, there is no guarantee that it would be viewed as
correct by anyone else or that it will match the ultimate standard.
-- Scott
∂07-Nov-84 0852 brown@DEC-HUDSON test format
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 7 Nov 84 08:43:57 PST
Date: Wed, 07 Nov 84 11:40:37 EST
From: brown@DEC-HUDSON
Subject: test format
To: cl-validation@su-ai
First, I would hope that submission of test software will not require
any lawyers. I view this as a one-time thing, the only purpose of which
is to get some preliminary test software available to all implementations,
and to give this committee some real data on language validation.
The creation and maintenance of the real validation software should be
the business of the third party funded to do this. I would hope that
they can use what we produce, but that should not be a requirement.
If we are going to generate some preliminary tests, we should develop
a standard format for the tests. I have attached a condensed and
reorganizied version of the "developers guide" for our test system.
Although I don't think our test system is particularly elegant, it
basically works. There are a few things I might change someday:
- The concept of test ATTRIBUTES is not particularly useful. We
have never run tests by their attributes but always run a whole
file full of them.
- The expected result is not evaluated (under the assumption that
if it were, most of the time you would end up quoting it. That
is sometimes cumbersome.
- There is not a builtin way to check multiple value return. You
make the test-case do a multiple-value-list and look at the list.
That is sometimes cumbersome and relatively easy to fix.
- We haven't automated the analysis of the test results.
- Our test system is designed to handle lot of little tests and I
think that it doesn't simplify writing complex tests. I have
never really thought about what kind of tools would be useful.
If we want to try to build some tests, I am willing to change our test
system to incorporate any good ideas and make it available.
-Gary
!
1 A SAMPLE TEST DEFINITION
Here is the test for GET.
(def-lisp-test (get-test :attributes (symbols get)
:locals (clyde foo))
"A test of get. Uses the examples in the text."
((fboundp 'get) ==> T)
((special-form-p 'get) ==> NIL)
((macro-function 'get) ==> NIL)
((progn
(setf (symbol-plist 'foo) '(bar t baz 3 hunoz "Huh?"))
(get 'foo 'bar))
==> T)
((get 'foo 'baz) ==> 3)
((get 'foo 'hunoz) ==> "Huh?")
((prog1
(get 'foo 'fiddle-sticks)
(setf (symbol-plist 'foo) NIL))
==> NIL)
((get 'clyde 'species) ==> NIL)
((setf (get 'clyde 'species) 'elephant) ==> elephant)
((get 'clyde) <error>)
((prog1
(get 'clyde 'species)
(remprop 'clyde 'species))
==> elephant)
((get) <error>)
((get 2) <error>)
((get 4.0 'f) <error>))
Notice that everything added to the property list is taken off again,
so that the test's second run will also work. Notice also that it
isn't wise to start by testing for
((get 'foo 'baz) ==> NIL)
as someone may have decided to give FOO the property BAZ already in
another test.
2 DEFINING LISP TESTS
Tests are defined with the DEF-LISP-TEST macro.
DEF-LISP-TEST {name | (name &KEY :ATTRIBUTES :LOCALS)} [macro]
[doc-string] test-cases
- 1 -
!
Page 2
3 ARGUMENTS TO DEF-LISP-TEST
3.1 Name
NAME is the name of the test. Please use the convention of
calling a test FUNCTION-TEST, where FUNCTION is the name of (one of)
the function(s) or variable(s) tested by that test. The symbol name
will have the expanded test code as its function definition and the
following properties:
o TEST-ATTRIBUTES - A list of all the attribute symbols which
have this test on their TEST-LIST property.
o TEST-DEFINITION - The expanded test code. Normally the
function value of the test is compiled; the value of this
property is EVALed to run the test interpreted.
o TEST-LIST - The list of tests with NAME as an attribute.
This list will contain at least NAME.
3.2 Attributes
The value of :ATTRIBUTES is a list of "test attributes". NAME
will be added to this list. Each symbol on this list will have NAME
added to the list which is the value of its TEST-LIST property.
3.3 Locals
Local variables can be specified and bound within a test by
specifying the :LOCALS keyword followed by a list of the for used in a
let var-list. For example, specifying the list (a b c) causes a, b
and c each to be bound to NIL during the run of the test; the list ((a
1) (b 2) (c 3)) causes a to be bound to 1, b to 2, and c to 3 during
the test.
3.4 Documentation String
DOC-STRING is a normal documentation string of documentation type
TESTS. To see the documentation string of a function FOO-TEST, use
(DOCUMENTATION 'FOO-TEST 'TESTS). The documentation string should
include the names of all the functions and variables to be tested in
that test. Mention if there is anything missing from the test, e.g.
tests of the text's examples.
- 2 -
!
Page 3
3.5 Test Cases
TEST-CASES (the remainder of the body) is a series of test cases.
Each test case is a list of a number of elements as follows. The
order specified here must hold.
3.5.1 Test Body -
A form to be executed as the test body. If it returns multiple
values, only the first will be used.
3.5.2 Failure Option -
The symbol <FAILURE> can be used to indicate that the test case
is known to cause an irrecoverable error (e.g. it goes into an
infinite loop). When the test case is run, the code is not executed,
but a message is printed to remind you to fix the problem. This
should be followed by normal result options. Omission of this option
allows the test case to be run normally.
3.5.3 Result Options -
3.5.3.1 Comparison Function And Expected Result -
The Test Body will be compared with the Expected Result using the
function EQUAL if you use
==> expected-result
or with the function you specify if you use
=F=> function expected-result
There MUST be white-space after ==> and =F=>, as they are treated as
symbols. Notice that neither function nor expected-result should be
quoted. "Function" must be defined; an explicit lambda form is legal.
"Expected-Result" is the result you expect in evaluating "test-body".
It is not evaluated. The comparison function will be called in this
format:
(function test-body 'expected-value)
3.5.3.2 Errors -
<ERROR> - The test is expected to signal an error. This will
normally be used with tests which are expected to generate errors.
This is an alternative to the comparison functions listed above.
There should not be anything after the symbol <ERROR>. It checks that
- 3 -
!
Page 4
an error is signaled when the test case is run interpreted, and that
an error is signaled either during the compilation of the case or
while the case is being evaluated when the test is run compiled.
3.5.3.3 Throws -
=T=> - throw-tag result - The test is expected to throw to the
specified tag and return something EQUAL to the specified result.
This clause is only required for a small number of tests. There must
be a space after =T=>, as it is treated as a symbol. This is an
alternative to the functions given above. This does not work compiled
at the moment, due to a compiler bug.
4 RUNNING LISP TESTS
The function RUN-TESTS can be called with no arguments to run all
the tests, with a symbol which is a test name to run an individual
test, or with a list of symbols, each of which is an attribute, to run
all tests which have that attribute. Remember that the test name is
always added to the attribute list automatically.
The special variable *SUCCESS-REPORTS* controls whether anything
will be printed for successful test runs. The default value is NIL.
The special variable *START-REPORTS* controls whether a message
containing the test name will be printed at the start of each test
execution. The default value is NIL. If *SUCCESS-REPORTS* is T, this
variable is treated as T also.
The special variable *RUN-COMPILED-TESTS* controls whether the
"compiled" versions of the specified tests will be run. The default
value is T.
The special variable *RUN-INTERPRETED-TESTS* controls whether the
"interpreted" versions of the specified tests will be run. The
default value is T.
The special variable *INTERACTIVE* controls whether you are
prompted after unexpected errors for whether you would like to enter
debug. It uses yes-or-no-p. To continue running tests after
enterring debug after one of these prompts, type CONTINUE. If
*INTERACTIVE* is set to T, the test system will do this prompting.
The default value is NIL.
5 GUIDE LINES FOR WRITING TEST CASES
1. The first several test cases in each test should be tests for
- 4 -
!
Page 5
the existence and correct type of each of the functions/variables to
be tested in that test. A variable, such as
*DEFAULT-PATHNAME-DEFAULTS*, should have tests like these:
((boundp '*default-pathname-defaults*) ==> T)
((pathnamep *default-pathname-defaults*) ==> T)
A function, such as OPEN, should have these tests:
((fboundp 'open) ==> T)
((macro-function 'open) ==> NIL)
((special-form-p 'open) ==> NIL)
A macro, such as WITH-OPEN-FILE, should have these tests:
((fboundp 'with-open-file) ==> T)
((not (null (macro-function 'with-open-file))) T)
Note that, as MACRO-FUNCTION returns the function definition (if it is
a macro) or NIL (if it isn't a macro), we use NOT of NULL of
MACRO-FUNCTION here. Note also that a macro may also be a special
form, so SPECIAL-FORM-P is not used: we don't care what the result
is.
A special form, such as SETQ, should have these tests:
((fboundp 'setq) ==> T)
((not (null (special-form-p 'setq))) T)
Again, note that SPECIAL-FORM-P returns the function definition (if it
is a special form) or NIL (if it isn't), so we use NOT of NULL of
SPECIAL-FORM-P here. Note also that we don't care if special forms
are also macros, so MACRO-FUNCTION is not used.
2. The next tests should be simple tests of each of your
functions. If you start right in with complicated tests, it can
become difficult to unravel simple bugs. If possible, create one-line
tests which only call one of the functions to be tested.
E.g. for +:
((+ 2 10) ==> 12)
3. Test each of the examples given in the Common Lisp Manual.
- 5 -
!
Page 6
4. Then test more complicated cases. Be sure to test both with
and without each of the optional arguments and keyword arguments. Be
sure to test what the manual SAYS, not what you know that we do.
5. Then test for obvious cases which should signal an error.
Obvious things to test are that it signals an error if there are too
few or too many arguments, or if the argument is of the wrong type.
E.g. for +
((+ 2 'a) <ERROR>)
6 HINTS
Don't try to be clever. What we need first is a test of
everything. If we decide that we need "smarter" tests later, we can
go back and embellish. Right now we need to have a test that shows
whether the functions and variables we are supposed to have are there,
and that tells whether at first glance the function is behaving
properly. Even with simple tests this test system will be huge.
Don't write long test cases if you can help it. Think about the
kind of error messages you might get and how easy it will be to debug
them.
Remember that, although the test system guarantees that the test
cases within one test are run in the order defined, no guarantee is
made that your tests will be run in the order in which they are
loaded. Do not write tests which depend on other tests having run
before them.
It is now possible to check for cases which should signal errors;
please do.
I have found it easiest to compose and then debug tests which
have no more than 20 cases. Once a test works I often add a number of
cases, however, and I do have some with over 100 cases. However,
sometimes tests with as few as 10 cases can be difficult to unravel,
if, for example, the test won't compile properly. Therefore, if there
is a group of related functions which require many tests each, I am
more likely to have a separate test for each function. If testing one
function is made more easy by also testing another (e.g.
define-logical-name, translate-logical-name and delete-logical-name),
it can be advantageous to test them together. It is not a good idea
to make the test cases or returned values very large, however. Also,
when many functions are tested in the same test, it is likely that the
tests can get complicated to debug and/or that some aspect of one of
the functions tested could be forgotten. Therefore, I would prefer
that you NOT write, say, four or five tests, each of which is supposed
- 6 -
!
Page 7
to test all of the functions in one part of the manual. I would
prefer that a function have a test which is dedicated to it (even if
it is shared with one or two other functions). This means that some
functions will be used not just in tests of themselves, but also in
tests of related functions; but that is ok.
Remember that each test will be run twice by the test system. So
if your test changes something, change it back.
7 EXAMPLES
7.1 Comparison Function
If you use the "( code =F=> comparison-function result )" format,
the result is now determined by doing (comparison-function code (quote
result)).
(2 =F=> < 4) <=> (< 2 4)
(2 =F=> > 4) <=> (> 2 4)
Notice that the new comparison function you introduce is unquoted.
You may also use an explicit lambda form. For example,
(2 =F=> (lambda (x y) (< x y)) 4) <=> (< 2 4)
7.2 Expected Result
Remember that the returned value for a test case is not
evaluated; so "==> elephant" means is it EQUAL to (quote elephant),
not to the value of elephant.
Consequently, this is in error:
(mapcar #'1+ (list 0 1 2 3)) ==> (list 1 2 3 4))
and this is correct:
(mapcar #'1+ (list 0 1 2 3)) ==> (1 2 3 4))
*Tests Return Single Values*
A test returns exactly one value; a test of a function
which returns multiple values must be written as:
(MULTIPLE-VALUE-LIST form)
- 7 -
!
Page 8
*Testing Side Effects*
A test of a side effecting function must verify that
the function both returns the correct value and
correctly causes the side effect. The following form
is an example of a body that does this:
((LET (FOO) (LIST (SETF FOO '(A B C)) FOO)))
==> ((A B C) (A B C)))
7.3 Throw Tags
The throw tag is also not evaluated.
You must have either "==> <result>" or "=F=> comparison-function
<result>" or "=T=> throw-tag <result>" or "<ERROR>" in each test case.
Remember that you may no longer use <-T- or <-S-. For example, this
would be correct:
((catch 'samson
(throw 'delilah 'scissors))
=T=> delilah scissors)
This test case would cause an unexpected error:
((catch 'samson
(throw 'delilah 'scissors))
==> scissors)
7.4 Expected Failures
Any test case can have the <FAILURE> option inserted to indicate
that the code should not be run. For example, these test cases are
innocuous:
((dotimes (count 15 7)
(setf count (1- count)))
<failure> ==> 7)
((dotimes (count 15 7)
(setf count (1- count)))
<failure> =F=> <= 7)
((throw 'samson (dotimes (count 15 7)
(setf count (1- count))))
<failure> =T=> samson 7)
- 8 -
!
Page 9
((car (dotimes (count 15 7)
(setf count (1- count))))
<failure> <error>)
Obviously, you are not expected to introduce infinite loops into the
test cases deliberately.
7.5 Sample Error And Success Reports
A test with cases which all succeed will run with no output if
*SUCCESS-REPORTS* is NIL; if it is set to T, output will look like
this:
************************************************************************
Starting: GET-TEST
A test of get. Uses the examples in the text.
TESTS:GET-TEST succeeded in compiled cases
1 2 3 4 5 6 7 8 9 10 11 12 13 14
TESTS:GET-TEST succeeded in interpreted cases
1 2 3 4 5 6 7 8 9 10 11 12 13 14
If a test case evaluates properly but returns the wrong value, an
error report will be made irrespective of the setting of
*SUCCESS-REPORTS*. The reports include the test case code, the
expected result, the comparison function used, and the actual result.
For example, if you run this test:
(def-lisp-test (+-test :attributes (numbers +))
((+) ==> 0)
((+ 2 3) ==> 4)
((+ -4 -5) =F=> >= 0))
The second and third cases are wrong, so there will be bug reports
like this:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
TESTS:+-TEST
Error in compiled case 2.
Expected: (+ 2 3)
to be EQUAL to: 4
Received: 5
-----------------------------------------------------------------------
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
TESTS:+-TEST
Error in compiled case 3.
Expected: (+ -4 -5)
- 9 -
!
Page 10
to be >= to: 0
Received: -9
------------------------------------------------------------------------
Unexpected errors cause a report which includes the code which
caused the error, the expected result, the error condition, and the
error message from the error system. As with other errors, these bugs
are reported regardless of the setting of *SUCCESS-REPORTS*. For
example:
(def-lisp-test (=-test :attributes (numbers =))
((fboundp '=) ==> T)
((macro-function '=) ==> NIL)
((special-form-p '=) ==> NIL))
The following report is given if MACRO-FUNCTION is undefined:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
TESTS:=-TEST compiled case 2 caused an unexpected
correctable error in function *EVAL.
Expected: (MACRO-FUNCTION '=)
to be EQUAL to: NIL
The error message is:
Undefined function: MACRO-FUNCTION.
-----------------------------------------------------------------------
8 RUNNING INDIVIDUAL TEST CASES
The interpreted version of a test case can be run individually.
Remember that if any variables are used which are modified in previous
test cases, the results will not be "correct"; for example, any local
variables bound for the test with the :LOCALS keyword are not bound if
a test case is run with this function. The format is
(RUN-TEST-CASE test-name test-case)
Test-name is a symbol; test-case is an integer.
9 PRINTING TEST CASES
There are some new functions:
(PPRINT-TEST-DEFINITION name)
(PPRINT-TEST-CASE name case-number)
(PPRINT-ENTIRE-TEST-CASE name case-number)
- 10 -
!
Page 11
(PPRINT-EXPECTED-RESULT name case-number)
In each case, name is a symbol. In the latter three cases,
case-number is a positive integer.
PPRINT-TEST-DEFINITION pretty prints the expanded test code for a
test.
PPRINT-TEST-CASE pretty prints the test code for the body of a
test case; i.e. the s-expression on the left of the arrow.
PPRINT-ENTIRE-TEST-CASE pretty prints the entire expanded test
code for the case in question, i.e. rather more than does
PPRINT-TEST-CASE and rather less than PPRINT-TEST.
PPRINT-EXPECTED-RESULT pretty prints the expected result for the
test case specified. This cannot be done for a case which is expected
to signal an error, as in that case there is no comparison of expected
and actual result.
- 11 -
∂09-Nov-84 0246 RWK@SCRC-STONY-BROOK.ARPA Hello
Received: from SCRC-STONY-BROOK.ARPA by SU-AI.ARPA with TCP; 9 Nov 84 02:46:18 PST
Received: from SCRC-HUDSON by SCRC-STONY-BROOK via CHAOS with CHAOS-MAIL id 123755; Thu 8-Nov-84 21:32:33-EST
Date: Thu, 8 Nov 84 21:33 EST
From: "Robert W. Kerns" <RWK@SCRC-STONY-BROOK.ARPA>
Subject: Hello
To: cl-validation@SU-AI.ARPA
Message-ID: <841108213326.0.RWK@HUDSON.SCRC.Symbolics.COM>
Hello. Welcome to the Common Lisp Validation committee. Let me
introduce myself, general terms, first.
I am currently the manager of Lisp System Software at Symbolics,
giving me responsibility for overseeing our Common Lisp effort,
among other things. Before I became a manager, I was a developer
at Symbolics. In the past I've worked on Macsyma, MacLisp and NIL
at MIT, and I've worked on object-oriented systems on these systems.
At Symbolics, we are currently preparing our initial Common Lisp
offering for release. Symbolics has been a strong supporter of Common
Lisp in its formative years, and I strongly believe that needs to
continue. Why do I mention this? Because I think one form of support
is to contribute our validation tests as we collect and organize them.
I urge other companies to do likewise. I believe we all have
far more to gain than to lose. I believe there will be far more
validation code available in the aggregate than any one company
will have available by itself. In addition, validation tests from
other places have the advantage of bringing a fresh perspective
to your testing. It is all too easy to test for the things you
know you made work, and far too difficult to test for the more
obscure cases.
As chairman, I see my job as twofold:
1) Facilitate communication, cooperation, and decisions.
2) Facilitate the implementation of decisions of the group.
Here's an agenda I've put together of things I think we
need to discuss. What items am I missing? This nothing
more than my own personal agenda to start people thinking.
First, the development issues:
1) Identify what tests are available. So far, I know of
the contribution by Skef Wholey. I imagine there will be
others forthcoming once people get a chance to get them
organized. (Myself included).
2) Identify a central location to keep the files. We
need someone on the Arpanet to volunteer some space for
files of tests, written proposals, etc. Symbolics is
not on the main Arpanet currently, so we aren't a good
choice. Volunteers?
Is there anyone who cannot get to files stored on
the Arpanet? If so, please contact me, and I'll arrange
to get files to you via some other medium.
3) We need to consider the review process for proposed
tests. How do we get tests reviewed by other contributors?
We can do it by FTPing the files to the central repository
and broadcasting a request to evaluating it to the list.
Would people prefer some less public form of initial evaluation?
4) Test implementation tools. We have one message from Gary Brown
describing his tool. I have a tool written using flavors that I
hope to de-flavorize and propose. I think we would do well to standardize
in this area as much as possible.
5) Testing techniques. Again, Gary Brown has made a number of excellent
suggestions here. I'm sure we'll all be developing experience that we
can share.
6) What areas do we need more tests on?
And there are a number of political, proceedural, and policy issues that
need to be resolved.
7) Trademark/copyright issues. At Monterey, DARPA voluntered to
investigate trademarking and copyrighting the validation suite.
RPG: have you heard anything on this?
8) How do we handle disagreements about the language? This was
discussed at the Monterey meeting, and I believe the answer, if
we can't work it out, we ask the Common Lisp mailing list, and
especially the Gang of Five, for a clarification. At any rate,
I don't believe it is in our charter to resolve language issues.
I expect we will IDENTIFY a lot of issues, however.
I don't think the rest of these need to be decided any time soon.
We can discuss them now, or we can wait.
9) How does a company (or University) get a Common Lisp implementation
validated, and what does it mean? We can discuss this now, but I
don't think we have to decide it until we produce our first validation
suite.
10) How do we distribute the validation suites? I hope we can do most
of this via the network. I am willing to handle distributing it to
people off the network until it gets too expensive in time or tapes.
We will need a longer-term solution to this, however.
11) Longer term maintenance of the test suites. I think having a
commercial entity maintain it doesn't make sense until we get the
language into a more static situation. I don't think there is
even agreement that this is the way it should work, for that
matter, but we have plenty of time to discuss this, and the situation
will be changing in the meantime.
So keep those cards and letters coming, folks!
∂12-Nov-84 1128 brown@DEC-HUDSON validation process
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 12 Nov 84 11:25:11 PST
Date: Mon, 12 Nov 84 14:26:14 EST
From: brown@DEC-HUDSON
Subject: validation process
To: cl-validation@su-ai
I am happy to see that another vendor (Symbolics) is interested in sharing
tests. I too believe we all much to gain by this kind of cooperation.
Since it seems that we will be creating and running tests, I would like
to expand a bit on an issue I raised previously - the ethics of validation.
A lot of information; either explicit or intuitive, concerning the quality
of the various implementations will surely be passed around on this mailing
list. I believe that this information must be treated confidentially. I
know of two recent instances when perceived bugs in our implementation of
Common Lisp were brought up in sales situations. I can not actively
participate in these discussions unless we all intend to keep this
information private.
I disagree with the last point Bob's "Hello" mail - the long term maintenance
of the test suite (however, I agree that we have time to work this out).
I believe that our recommendation should be that ARPA immediately fund a
third party to create/maintain/administer language validation.
One big reason is to guarantee impartiality and to protect ourselves.
If Common Lisp validation becomes a requirement for software on RFPs,
big bucks might be a stake and we need guarantee that the process is
impartial and, I think, we want a lot of distance between ourselves and
the validation process. I don't want to get sued by XYZ inc. because their
implementation didn't pass and this caused them to lose a contract and go
out of business.
Of course, if ARPA isn't willing to fund this, then we Common Lispers will
have to do something ourselves. It would be useful if we could get
some preliminary indication from ARPA about their willingness to fund
this type effort.
∂12-Nov-84 1237 FAHLMAN@CMU-CS-C.ARPA validation process
Received: from CMU-CS-C.ARPA by SU-AI.ARPA with TCP; 12 Nov 84 12:36:09 PST
Received: ID <FAHLMAN@CMU-CS-C.ARPA>; Mon 12 Nov 84 15:35:13-EST
Date: Mon, 12 Nov 1984 15:35 EST
Message-ID: <FAHLMAN.12063043155.BABYL@CMU-CS-C.ARPA>
Sender: FAHLMAN@CMU-CS-C.ARPA
From: "Scott E. Fahlman" <Fahlman@CMU-CS-C.ARPA>
To: brown@DEC-HUDSON.ARPA
Cc: cl-validation@SU-AI.ARPA
Subject: validation process
In-reply-to: Msg of 12 Nov 1984 14:26-EST from brown at DEC-HUDSON
I don't see how confidentiality of validation results can be maintained
when the validation suites are publically available (as they must be).
If DEC has 100 copies of its current Common Lisp release out in
customer-land, and if the validation programs are generally available to
users and manufacturers alike, how can anyone reasonably expect that
users will not find out that this release fails test number 37? I think
that any other manufacturer had better be without sin before casting the
first stone in a sales presentation, but certainly there will be some
discussion of which implementations are fairly close and which are not.
As with benchmarks, it will take some education before the public can
properly interpret the results of such tests, and not treat the lack of
some :FROM-END option as a sin of equal magnitude to the lack of a
package system.
The only alternative that I can see is to keep the validation suite
confidential in some way, available only to manufacturers who promise to
run it on their own systems only. I would oppose that, even if it means
that some manufacturers would refrain from contributing any tests that
their own systems would find embarassing. It seems to me that making
the validation tests widely available is the only way to make them
widely useful as a standardization tool and as something that can be
pointed at when a contract wants to specify Common Lisp. Of course, it
would be possible to make beta-test users agree not to release any
validation results, just as they are not supposed to release benchmarks.
I agree with Gary that we probably DO want some organization to be the
official maintainer of the validation stuff, and that this must occur
BEFORE validation starts being written into RFP's and the like. We
would have no problem with keeping the validation stuff online here at
CMU during the preliminary development phase, but as soon as the lawyers
show up, we quit.
-- Scott
∂12-Nov-84 1947 fateman%ucbdali@Berkeley Re: validation process
Received: from UCB-VAX.ARPA by SU-AI.ARPA with TCP; 12 Nov 84 19:47:22 PST
Received: from ucbdali.ARPA by UCB-VAX.ARPA (4.24/4.39)
id AA10218; Mon, 12 Nov 84 19:49:39 pst
Received: by ucbdali.ARPA (4.24/4.39)
id AA13777; Mon, 12 Nov 84 19:43:29 pst
Date: Mon, 12 Nov 84 19:43:29 pst
From: fateman%ucbdali@Berkeley (Richard Fateman)
Message-Id: <8411130343.AA13777@ucbdali.ARPA>
To: brown@DEC-HUDSON, cl-validation@su-ai
Subject: Re: validation process
I think that confidentiality of information on this mailing list is
unattainable, regardless of its desirability.
∂13-Nov-84 0434 brown@DEC-HUDSON Confidentially loses
Received: from DEC-HUDSON.ARPA by SU-AI.ARPA with TCP; 13 Nov 84 04:34:11 PST
Date: Tue, 13 Nov 84 07:35:21 EST
From: brown@DEC-HUDSON
Subject: Confidentially loses
To: fahlman@cmu-cs-c
Cc: cl-validation@su-ai
I guess you are right. I can't expect the results of public domain tests
or the communications on this mailing list to be treated confidentially.
So, I retract the issue. I'll make that my own comments are not "sensitive".
-Gary
∂18-Dec-85 1338 PACRAIG@USC-ISIB.ARPA Assistance please?
Received: from USC-ISIB.ARPA by SU-AI.ARPA with TCP; 18 Dec 85 13:36:21 PST
Date: 18 Dec 1985 11:17-PST
Sender: PACRAIG@USC-ISIB.ARPA
Subject: Assistance please?
From: Patti Craig <PACraig@USC-ISIB.ARPA>
To: CL-VALIDATION@SU-AI.ARPA
Message-ID: <[USC-ISIB.ARPA]18-Dec-85 11:17:56.PACRAIG>
Hi,
Need some information relative to the CL-VALIDATION@SU-AI
mailing list. Would the maintainer of same please contact
me.
Thanks,
Patti Craig
USC-Information Sciences Institute
∂12-Mar-86 2357 cfry%OZ.AI.MIT.EDU@MC.LCS.MIT.EDU Validation proposal
Received: from MC.LCS.MIT.EDU by SU-AI.ARPA with TCP; 12 Mar 86 23:56:26 PST
Received: from MOSCOW-CENTRE.AI.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 13 Mar 86 02:55-EST
Date: Thu, 13 Mar 86 02:54 EST
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Validation proposal
To: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Message-ID: <860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>
We need to have a standard format for validation tests.
To do this, I suggest we hash out a design spec
before we get serious about assigning chapters to implementors.
I've constructed a system which integrates diagnostics and
hacker's documentation. I use it and it saves me time.
Based on that, here's my proposal for a design spec.
GOAL [in priority order]
To verify that a given implementation is or is not correct CL.
To aid the implementor in finding out the discrepancies between
his implementation and the agreed upon standard.
To suppliment CLtL by making the standard more prescise.
To provide examples for future CLtLs, or at least a format
for machine-readable examples, which will make it easier to
verify that the examples are, in fact, correct.
..... the below of auxiliary importance
To facilitate internal documentation [documenatation
used primarily by implementaors while developing]
To give CL programmers a suggested format for diagnostics and
internal documentation. [I argue that every programmer of
a medium to large program could benifit from such a facility].
RELATION of validation code to CL
It should be part of yellow pages, not CL.
IMPLEMENTATION: DESIRABLE CHARACTERISTICS
small amount of code
uses a small, simple subset of CL so that:
1. implementors can use it early in the developement cycle
2. It will depend on little and thus be more reliable.
[we want to test specific functions in a controlled way,
not the code that implements the validation software.]
We could, for example, avoid using:
macros,
complex lambda-lists,
sequences,
# reader-macros,
non-fixnum numbers
FEATURES & USER INTERFACE:
simple, uniform, lisp syntax
permit an easy means to test:
- all of CL
- all of the functions defined in a file.
- all of the tests for a particular function
- individual calls to functions.
Allow a mechanism for designating certain calls as
"examples" which illustrate the functionality of the
function in question. Each such example should have
-the call
-the expected result [potentially an error]
-an optional explaination string ie
"This call errored because the 2nd arg was not a number."
----------
Here's an example of diagnostics for a function:
(test:test 'foo
'((test:example (= (foo 2 3) 5) "foo returns the sum of its args.")
;the above is a typical call and may be used in a manual along
;with the documentation string of the fn
(not (= (foo 4 5) -2))
;a diagnostic not worthy of being made an example of. There will
;generally be several to 10's of such calls.
(test:expected-error (foo 7) "requires 2 arguments")
;if the expression is evaled, it should cause an error
(test:bug (foo 3 'bar) "fails to check that 2nd arg is not a number")
;does not perform as it should. Such entries are a convienient place
;for a programmer to remind himself that the FN isn't fully debugged yet.
(test:bug-that-crashes (foo "trash") "I've GOT to check the first arg with numberp!")
))
TEST is a function which sequentially processes the elements of the
list which is its 2nd arg. If an entry is a list whose car is:
test:example evaluate the cadr. if result is non-nil
do nothing, else print a bug report.
test:expected-error evaluate the cadr. If it does not produce an error,
then print a bug report.
test:bug evaluate the cadr. it should return NIL or error.
If it returns NIL or error, print a "known" bug report.
otherwise print "bug fixed!" message.
[programmer should then edit the entry to not be wrapped in
a test:bug statement.]
test:bug-that-crashes Don't eval the cadr. Just print the
"known bug that crashes" bug report.
There's a bunch of other possibilities in this area, like:
test:crash-example don't eval the cadr, but use this in documentation
Any entry without a known car, will just get evaled, if it returns nil or errors,
print a bug report. The programmer can then fix the bug, or wrap a
test:bug around the call to acknowledge the bug. This helps separate the
"I've seen this bug before" cases from the "this is a new bug" cases.
With an editor that permits evaluation of expressions, [emacs and sons]
its easy to eval single calls or the whole test.
When evaluating the whole test, a summary of what went wrong can be
printed at the end of the sequence like "2 bugs found".
I find it convienient to place calls to test right below the definition
of the function that I'm testing. My source code files are about
half tests and half code. I have set up my test function such that
it checks to see if it is being called as a result of being loaded
from a file. If so, it does nothing. Our compiler is set up to
ignore calls to TEST, so they don't get into compiled files.
I have a function called TEST-FILE which reads each form in the file.
If the form is a list whose car is TEST, the form is evaled, else the
form is ignored.
Some programmers prefer to keep tests in a separate file from the
source code that they are writting. This is just fine in my implementation,
except that that a list of the source code files can't be used in
testing a whole system unless there's a simple mapping between
source file name and test file name.
Its easy to see how a function could read through a file and pull
put the examples [amoung other things].
Since the first arg to the TEST fn is mainly used to tell the user what
test is being performed, it could be a string explaining in more
detail the catagory of the below calls, ie "prerequisites-for-sequences" .
Notice that to write the TEST function itself, you need not have:
macros, &optional, &rest, or &key working, features that minimal lisps
often lack.
Obviously this proposal could use creativity of many sorts.
Our actual spec should be to make the file format, though, not
to add fancy features. Such features can vary from implementation to
implementation, which will aid evolution of automatic diagnostics and
documentation software.
But to permit enough hooks in the file format, we need insight as to the potential
breadth of such a mechanism. Thus, new goals might also be a valuable
addition to this proposal.
FRY
∂13-Mar-86 1015 berman@isi-vaxa.ARPA Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86 10:12:38 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA03979; Thu, 13 Mar 86 10:12:11 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131812.AA03979@isi-vaxa.ARPA>
Date: 13 Mar 1986 1012-PST (Thursday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
<860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>
Christopher,
Thanks for the suggestion. Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources. ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.
A single validation suite will eventually be constructed with the existing
tests as a starting point. Therefore, we will probably not seriously consider
a standard until we have examined this extant code. I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.
Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.
Etc.,
RB
∂13-Mar-86 1028 berman@isi-vaxa.ARPA Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 13 Mar 86 10:28:21 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA04181; Thu, 13 Mar 86 10:27:56 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603131827.AA04181@isi-vaxa.ARPA>
Date: 13 Mar 1986 1027-PST (Thursday)
To: Christopher Fry <cfry@MIT-OZ%MIT-MC.ARPA>
Cc: berman@ISI-VAXA.ARPA, cl-validation@SU-AI.ARPA
Subject: Re: Validation proposal
In-Reply-To: Your message of Thu, 13 Mar 86 02:54 EST.
<860313025420.4.CFRY@MOSCOW-CENTRE.AI.MIT.EDU>
Christopher,
Thanks for the suggestion. Unfortunately there are already many thousands of
lines of validation code written amongst a variety of sources. ISI is
supposed to first gather these and then figure out which areas are covered,
and in what depth.
A single validation suite will eventually be constructed with the existing
tests as a starting point. Therefore, we will probably not seriously consider
a standard until we have examined this extant code. I'll keep CL-VALIDATION
informed of the sort of things we discover, and at some point I will ask for
proposals, if indeed I don't put one together myself.
Once we know what areas are already covered, we will assign the remaining
areas to the various willing victims (er, volunteers) to complete, and it is
this part of the suite which will be created with a standard in place.
Etc.,
RB
P.S.
I had to change your address (see header) 'cuz for some reason our mail
handler threw up on the one given with your message.
∂17-Mar-86 0946 berman@isi-vaxa.ARPA Re: Validation proposal
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 17 Mar 86 09:46:27 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA11654; Mon, 17 Mar 86 09:46:19 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603171746.AA11654@isi-vaxa.ARPA>
Date: 17 Mar 1986 0946-PST (Monday)
To: cfry%oz@MIT-MC.ARPA
Cc: cl-Validation@su-ai.arpa
Subject: Re: Validation proposal
In-Reply-To: Your message of Mon, 17 Mar 86 04:30 EST.
<860317043024.5.CFRY@DUANE.AI.MIT.EDU>
Thanks, and I look forward to seeing your tests. And yes, I'm sure that
interested parties will get to review the test system before its in place.
RB
------- End of Forwarded Message
∂19-Mar-86 1320 berman@isi-vaxa.ARPA Re: Validation Contributors
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 19 Mar 86 13:20:08 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA08917; Wed, 19 Mar 86 13:19:50 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603192119.AA08917@isi-vaxa.ARPA>
Date: 19 Mar 1986 1319-PST (Wednesday)
To: Reidy.pasa@Xerox.COM
Cc: Reidy.pasa@Xerox.COM, berman@isi-vaxa.ARPA, CL-Validation@su-ai.ARPA
Subject: Re: Validation Contributors
In-Reply-To: Your message of 19 Mar 86 11:29 PST.
<860319-112930-3073@Xerox>
As a matter of fact, in the end it WILL be organized parrallel to the book.
For now I'm just gathering the (often extensive) validation suites that have
been produced at various sites. These will need to be evaluated before
assigning tasks to people who want to write some code for this. By that time
we will also have a standard format for these tests so that this new code will
fit in with the test manager.
Send messages to CL-VALIDATION@SU-AI.ARPA rather than CL general list when
discussing this, unless it is of broader interest ofcourse.
Thanks.
RB
∂27-Mar-86 1332 berman@isi-vaxa.ARPA Validation Distribution Policy
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 27 Mar 86 13:32:16 PST
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA22595; Thu, 27 Mar 86 13:32:06 pst
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8603272132.AA22595@isi-vaxa.ARPA>
Date: 27 Mar 1986 1332-PST (Thursday)
To: CL-Validation@su-ai.arpa
Subject: Validation Distribution Policy
------- Forwarded Message
Return-Path: <OLDMAN@USC-ISI.ARPA>
Received: from USC-ISI.ARPA by isi-vaxa.ARPA (4.12/4.7)
id AA13746; Wed, 26 Mar 86 13:35:26 pst
Date: 26 Mar 1986 16:24-EST
Sender: OLDMAN@USC-ISI.ARPA
Subject: Validation in CL
From: OLDMAN@USC-ISI.ARPA
To: berman@ISI-VAXA.ARPA
Message-Id: <[USC-ISI.ARPA]26-Mar-86 16:24:40.OLDMAN>
Yes, we have tests and a manager. I have started the wheels
moving on getting an OK from management for us to donate them.
Is there a policy statement on how they will be used or
distributed available? ...
-- Dan Oldman
------- End of Forwarded Message
I don't recall any exact final statement of the type of access. I remember
there was some debate on whether it should be paid for by non-contributors,
but was there any conclusion?
RB
∂29-Mar-86 0819 FAHLMAN@C.CS.CMU.EDU Validation Distribution Policy
Received: from C.CS.CMU.EDU by SU-AI.ARPA with TCP; 29 Mar 86 08:19:13 PST
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Sat 29 Mar 86 11:19:51-EST
Date: Sat, 29 Mar 1986 11:19 EST
Message-ID: <FAHLMAN.12194592953.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: berman@λisi-vaxa.ARPA (Richard Berman)λ
Cc: CL-Validation@SU-AI.ARPA
Subject: Validation Distribution Policy
In-reply-to: Msg of 27 Mar 1986 16:32-EST from berman at isi-vaxa.ARPA (Richard Berman)
I don't recall any exact final statement of the type of access. I remember
there was some debate on whether it should be paid for by non-contributors,
but was there any conclusion?
I believe that the idea that free access to the validation code be used
as an incentive to get companies to contribute was discussed at the
Boston meeting, but finally abandoned as being cumbersome, punitive, and
not necessary. Most of the companies there agreed to contribute
whatever vailidation code they had, and/or some labor to fill any holes
in the validation suite, with the understanding that the code would be
pulled into a reasonably coherent form at ISI and then would be made
freely available to all members of the community. This release would
not occur until a number of companies had contributed something
significant, and then the entire collection up to that point would be
made available at once.
I believe that Dick Gabriel was the first to say that his company would
participate under such a plan, and that he had a bunch of conditions
that had to be met. If there are any not captured by the above
statement, maybe he can remind us of them.
-- Scott
∂16-Jun-86 1511 berman@isi-vaxa.ARPA Validation Suite
Received: from ISI-VAXA.ARPA by SU-AI.ARPA with TCP; 16 Jun 86 15:11:47 PDT
Received: by isi-vaxa.ARPA (4.12/4.7)
id AA19003; Mon, 16 Jun 86 15:11:38 pdt
From: berman@isi-vaxa.ARPA (Richard Berman)
Message-Id: <8606162211.AA19003@isi-vaxa.ARPA>
Date: 16 Jun 1986 1511-PDT (Monday)
To: CL-VALIDATION@su-ai.arpa
Cc: berman@isi-vaxa.ARPA
Subject: Validation Suite
Well, now that some of the contributions to the Great Validation Suite have
begun to filter in, I have been asked to make a report for broad issue on 1
July summarizing the status of all the validation contributions.
I hope this is enough time so that everything can be whipped into shape.
Please do contact me regarding the status of your validation and how its
progressing. If I haven't yet contacted you, please send me a mesage. You
may not be on my list. (Also, I cannot seem to reach a few of you via network
for whatever reason).
So...
I DO need you validation contributions.
We ARE putting together a master validation suite, once more of the
contributions arrive.
Thanks.
Richard Berman
USC/ISI
(213) 822-1511
∂09-Jul-86 1213 berman@vaxa.isi.edu Validation Control
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 9 Jul 86 12:09:58 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA27003; Wed, 9 Jul 86 12:09:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607091909.AA27003@vaxa.isi.edu>
Date: 9 Jul 1986 1209-PDT (Wednesday)
To: CL-Validation@SU-AI.ARPA
Cc:
Subject: Validation Control
Well, I've got quite a goodly collection of tests from which to construct a
first pass suite. Here's the situation: Each set of tests (from the various
vendors) uses it's own control mechanism, usually in the form of some macro
surrounding a (set of) test(s). Some require an error handler.
By and large all tests take a similar fo. Each is composed of a few parts:
1. A form to evaluate.
2. The desired result.
3. Some kind of text for error reporting.
Some versions give each test a unique name.
Some versions specify a test "type", e.g. evaltest means to evaluate the form,
errortest means the test should generate an error (and so the macro could
choose not to do anything with the test if no error handling is present).
What I am looking for is a simple and short proposal for how to
arrange/organize tests in the suite. Currently I am organizing according to
sections in CLtL. This isn't entirely sufficient, especially for some of the
changes that have been accepted since its publication.
So what kind of control/reporting/organizing method seems good to you?
As I am already organizing this, please do not delay. If enough inertia
builds up then whatever I happen to decide will end up as the first pass. So
get your tickets NOW!
RB
∂22-Jul-86 1344 berman@vaxa.isi.edu test control
Received: from VAXA.ISI.EDU by SU-AI.ARPA with TCP; 22 Jul 86 13:44:07 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA16122; Tue, 22 Jul 86 13:44:01 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607222044.AA16122@vaxa.isi.edu>
Date: 22 Jul 1986 1343-PDT (Tuesday)
To: cl-validation@su-ai.arpa
Cc:
Subject: test control
I am preparing the first cut at the test suite. Each test is wrapped in a
macro, which I propose below:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
The macro for test control should allow for the following:
1. Contributor string. Who wrote/contributed it.
2. Test I.D. In most cases this would be just the name of the function. In
other cases it may be an identifier as to what feature is being tested, such
as SCOPING.
3. Test type. E.g. Eval, Error, Ignore, etc.
4. N tests (or pairs of tests and expected results).
5. Side effects testing. With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.
6. Test name. Unique for each test.
7. Form to evaluate if test fails. This may be useful later to help analyze
beyond the first order.
8. Error string.
In number 2 above, the identifier must be selected from amongst those provided
in a database. This database relates identifiers to section numbers (or to
some other ordering scheme) and is used by some form of test management to
schedule the sequence of testing. This allows for automatic ordering. For
example, all the function names are in the database, as well as such "topics"
as scoping, error detection, etc.
For now the ordering database will probably be aligned with the silver book,
but later on I expect it will be organized parallel with the language spec.
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
Note: I've already got the data base in some form. What I want to know from
you as a test contributor (or potential contributor) is: Does the above macro
provide enough information for adequate control and analysis, in you opinion?
Suggestions should be sent soon, because I'm gonna be implementing it in the
next 10 days.
Best,
RB
∂23-Jul-86 2104 NGALL@G.BBN.COM Re: test control
Received: from BBNG.ARPA by SAIL.STANFORD.EDU with TCP; 23 Jul 86 21:03:48 PDT
Date: 24 Jul 1986 00:00-EDT
Sender: NGALL@G.BBN.COM
Subject: Re: test control
From: NGALL@G.BBN.COM
To: berman@ISI-VAXA.ARPA
Cc: cl-validation@SU-AI.ARPA
Message-ID: <[G.BBN.COM]24-Jul-86 00:00:45.NGALL>
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>
Date: 22 Jul 1986 1343-PDT (Tuesday)
From: berman@vaxa.isi.edu (Richard Berman)
To: cl-validation@su-ai.arpa
Subject: test control
Message-ID: <8607222044.AA16122@vaxa.isi.edu>
I am preparing the first cut at the test suite. Each test is wrapped in a
macro, which I propose below:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
The macro for test control should allow for the following:
1. Contributor string. Who wrote/contributed it.
2. Test I.D. In most cases this would be just the name of the function. In
other cases it may be an identifier as to what feature is being tested, such
as SCOPING.
3. Test type. E.g. Eval, Error, Ignore, etc.
4. N tests (or pairs of tests and expected results).
5. Side effects testing. With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.
6. Test name. Unique for each test.
7. Form to evaluate if test fails. This may be useful later to help analyze
beyond the first order.
8. Error string.
In number 2 above, the identifier must be selected from amongst those provided
in a database. This database relates identifiers to section numbers (or to
some other ordering scheme) and is used by some form of test management to
schedule the sequence of testing. This allows for automatic ordering. For
example, all the function names are in the database, as well as such "topics"
as scoping, error detection, etc.
For now the ordering database will probably be aligned with the silver book,
but later on I expect it will be organized parallel with the language spec.
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
Note: I've already got the data base in some form. What I want to know from
you as a test contributor (or potential contributor) is: Does the above macro
provide enough information for adequate control and analysis, in you opinion?
Suggestions should be sent soon, because I'm gonna be implementing it in the
next 10 days.
Best,
RB
--------------------
How about a field that indicates which revision of CL this test
applies to?
-- Nick
∂24-Jul-86 0254 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU test control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86 02:53:02 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40517; Thu 24-Jul-86 05:55:50-EDT
Date: Thu, 24 Jul 86 05:54 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8607222044.AA16122@vaxa.isi.edu>
Message-ID: <860724055418.1.CFRY@DUANE.AI.MIT.EDU>
I am preparing the first cut at the test suite. Each test is wrapped in a
macro, which I propose below:
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
The macro for test control should allow for the following:
1. Contributor string. Who wrote/contributed it.
Nice to keep around. But won't you generally have a whole bunch of tests
in a file from 1 contributor? You shouldn't have to have their name
on every test.
2. Test I.D. In most cases this would be just the name of the function. In
other cases it may be an identifier as to what feature is being tested, such
as SCOPING.
3. Test type. E.g. Eval, Error, Ignore, etc.
Please be more specific on what this means.
4. N tests (or pairs of tests and expected results).
Typically how large is N? 1, 10, 100, 1000?
5. Side effects testing. With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.
Particularly for a large N, side effect testing should be textually adjcent to
whatever its affecting.
6. Test name. Unique for each test.
This should be adjacent to test-id
7. Form to evaluate if test fails. This may be useful later to help analyze
beyond the first order.
typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
Should it be evaled for each of the N that fail?
8. Error string.
Similar to above?
In number 2 above, the identifier must be selected from amongst those provided
in a database. This database relates identifiers to section numbers (or to
some other ordering scheme) and is used by some form of test management to
schedule the sequence of testing. This allows for automatic ordering. For
example, all the function names are in the database, as well as such "topics"
as scoping, error detection, etc.
For now the ordering database will probably be aligned with the silver book,
but later on I expect it will be organized parallel with the language spec.
←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←←
Note: I've already got the data base in some form. What I want to know from
you as a test contributor (or potential contributor) is: Does the above macro
provide enough information for adequate control and analysis, in you opinion?
Suggestions should be sent soon, because I'm gonna be implementing it in the
next 10 days.
Above is not only ambiguous, but too abstract to get a feel for it.
Send us several examples, both typical and those at the extreme ranges of
size and complexity. I want to see the actual syntax.
Guessing at what you mean here, it looks like its going to take someone a very
long time to make the tests in such a complex format.
And you lose potential flexibility.
My format distributes control much more locally to each form to be evaled.
And it allows for simple incremental add-ons for things you missed in the spec
the first time around. For example, the "EXPECT-ERROR" fn below is such an add-on.
It is not an integral part of the diagnostic-controller, which itself is
quite simple.
To re-iterate my plan:
There's a wrapper for a list of forms to evaluate, typically 5 to 20 forms.
Each form is evaled and if it returns NON-NIL, it passes.
Example:
(test '+
(= (+ 2 3) 5)
(expect-error (+ "2" "3")) ;returns T if the call to + errors
(setq foo (+ 1 2))
(= foo 3) ;tests side effect. The forms are expected to be evaled sequentially.
;anything that depends on a particular part of the environment to be "clean"
;before it tests something should have forms that clean it up first,
; like before the above call to setq you might say (makunbound 'foo)
(progn (bar) t) ; one way of testing a form where it is expected not to error
;but don't care if it returns NIL or NON-NIL. If you found you were using this
;idiom a lot, you could write DONT-CARE trivially, as an add-on.
)
If you really wanted to delcare that a particular call tested a side-effect, or that
a particular call produced a side-effect, you could write a small wrapper fn for it,
but I'd guess that wouldn't be worth the typing. Such things should be obvious from
context.
Programmers are very reluctant to write diagnostics, so lets try to
make it as painless as possible. Maybe there could be some
macros that would fill in certain defaults of your full-blown format.
One of the things that's so convienient about my mechanism is that
a hacker can chose to, with a normal lisp text editor, eval part of
a call, a whole call, a group of calls [by selecting the region],
a whole TEST, or via my fn "test-file" a whole file.
[I also have "test-module" functionality for a group of files.]
Having this functionality makes the diagnostics more than just
a "validation" suite. It makes it a real programming tool.
And thus it will get used more often, and the tests themselves will
get performed more often.
This will lead to MORE tests as well as MORE TESTED tests, which
also implies that hackersimplementors will have more tested implementations,
which, after all, furthers the ultimate goal of having accurate
implementations out there.
.....
Before settling on a standard format, I'd also recommend just
converting a large file of tests into the proposed format
[before implementing the code that performs the test].
This will help you feel redundancies in the format
by noticing your worn out fingers.
But it will also help you see what parts of the syntax are
hard to remember and in need of more keywords or better named
functions, or less nested parens.
If the proposed format passes this test, it can be used as the
TEST code for the TEST software itself, as well as testing CL.
If not, you didn't waste time implementing a bad spec.
Despite the volume of my comments, I'm glad you're getting
down to substantial issues on what features to include.
CFry
∂24-Jul-86 1053 berman@vaxa.isi.edu Re: test control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86 10:50:55 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA04581; Thu, 24 Jul 86 10:49:05 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241749.AA04581@vaxa.isi.edu>
Date: 24 Jul 1986 1049-PDT (Thursday)
To: NGALL@G.BBN.COM
Cc: cl-validation@SU-AI.ARPA, berman@ISI-VAXA.ARPA
Subject: Re: test control
In-Reply-To: Your message of 24 Jul 1986 00:00-EDT.
<[G.BBN.COM]24-Jul-86 00:00:45.NGALL>
'Cuz the whole suite will be for a particular revision. There will
be no tests in the suite that do not apply to the particular level/revision.
RB
∂24-Jul-86 1148 marick%turkey@gswd-vms.ARPA Re: test control
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Jul 86 11:22:08 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
id AA11926; Thu, 24 Jul 86 13:20:56 CDT
Message-Id: <8607241820.AA11926@gswd-vms.ARPA>
Date: Thu, 24 Jul 86 13:20:47 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Re: test control
I have trouble visualizing what a test looks like. Could you provide
examples?
Some general comments:
1. I hope that often-unnecessary parts of a test (like contributor
string, error string, form-to-evaluate-if-test-fails) are optional.
2. It would be nice if the test driver were useful for small-scale
regression testing. (That is, "I've changed TREE-EQUAL. O driver,
please run all the tests for TREE-EQUAL.") It seems you have this in
mind, but I just wanted to reinforce any tendencies.
3. The format of the database should be published, since people will
want to write programs that use it.
4. It's very useful to have an easy way of specifying the predicate to
use when comparing the actual result to the expected result.
The test suite ought to come with a library of such predicates.
5. I'd like to see a complete list of test types. What a test type is
is a bit fuzzy, but we have at least the following:
ordinary -- form evaluated and compared to unevaluated expected result.
(This is a convenience; you get tired of typing ')
eval -- form evaluated and compared to evaluated expected result.
fail -- doesn't run the test, just notes that there's an error. This
is used when an error breaks the test harness; it shouldn't
appear in the distributed suite, of course, but it will be
useful for people using the test suite in day-to-day regression
testing.
error -- the form is expected to signal an error; it fails if it does
not.
is-error -- if the form signals an error it passes. If it doesn't signal
an error, it passes only if it matches the "expected" result.
We use this to make sure that some action which is defined to
be "is an error" produces either an error or some sensible result.
It may not be appropriate for the official suite. (Note that there
really should be an evaluating and a non-evaluating version.)
6. Then you need to cross all those test types with a raft of issues
surrounding the compiler. Like:
a. For completeness, you should run the tests interpreted, compiled with
#'COMPILE, and compiled with #'COMPILE-FILE. (What COMPILE-FILE does
might not be a strict superset of what COMPILE does.)
b. Suppose you're testing a signalled error. What happens if the error
is detected at compile time? (This is something like the IS-ERROR case
above: either the compile must fail or running the compiled version
should do the same thing the interpreted version does.)
c. It may be the case that compiled code does less error checking than
interpreted code. OPTIMIZE switches can have the same effect. So you may
want to write tests that expect errors in interpreted code, but not in
compiled code. (This, again, is probably not relevant to the official test
suite, but, again, the easier it is to tune the test suite, the happier
implementors will be.)
6. What does the output look like? This test suite is going to be
huge, so it's especially important that you be able to easily find
differences between successive runs.
∂24-Jul-86 1546 berman@vaxa.isi.edu
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86 12:38:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA06466; Thu, 24 Jul 86 12:35:53 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607241935.AA06466@vaxa.isi.edu>
Date: 24 Jul 1986 1235-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA
Cc: cl-validation@su-ai.arpa
Subject:
Let me clarify. First, I don't think this macro is used to control testing so
much as it is to help maintain the actual testing suite itself. The testing
suite is supposed to eventually incarnate under ISI's FSD data base fascility,
as described in the proposal that I offered to one and all a short while back.
What this macro should do is allow me to build a test suite from amongst all
the tests. With that in mind:
1. I hope that often-unnecessary parts of a test (like contributor
string, error string, form-to-evaluate-if-test-fails) are optional.
Probably, except for contributor. The others can be NIL or created from other
data.
2. It would be nice if the test driver were useful for small-scale
regression testing. (That is, "I've changed TREE-EQUAL. O driver,
please run all the tests for TREE-EQUAL.") It seems you have this in
mind, but I just wanted to reinforce any tendencies.
Sure.
3. The format of the database should be published, since people will
want to write programs that use it.
Unlikely. See above re: FSD. It can't "be published" as it is just part of a
live environment.
4. It's very useful to have an easy way of specifying the predicate to
use when comparing the actual result to the expected result.
The test suite ought to come with a library of such predicates.
Well -- you could be a little more clear on this. Like what? Also, it is the
contributors who will write these tests. I imagine that most of the time an
EQ or EQUAL type would be used, and other less typical or special purpose
predicates will probably not be useful to other contributors.
5. I'd like to see a complete list of test types. What a test type is
is a bit fuzzy, but we have at least the following:
ordinary -- form evaluated and compared to unevaluated expected result.
(This is a convenience; you get tired of typing ')
eval -- form evaluated and compared to evaluated expected result.
fail -- doesn't run the test, just notes that there's an error. This
is used when an error breaks the test harness; it shouldn't
appear in the distributed suite, of course, but it will be
useful for people using the test suite in day-to-day regression
testing.
error -- the form is expected to signal an error; it fails if it does
not.
is-error -- if the form signals an error it passes. If it doesn't signal
an error, it passes only if it matches the "expected" result.
We use this to make sure that some action which is defined to
be "is an error" produces either an error or some sensible result.
It may not be appropriate for the official suite. (Note that there
really should be an evaluating and a non-evaluating version.)
Sounds to me like you got the idea. These are classifications of tests used
to control the testing process. In addition, this being a part of the
database, one could create a test suite for just certain classes of tests.
And as for compiler stuff--for now it will probably just allow you to test
each test interpreted, compiled or both (possibly not in the very first cut).
Other issues will be taken up as the suite develops.
6. What does the output look like? This test suite is going to be
huge, so it's especially important that you be able to easily find
differences between successive runs.
Each failing test will give some kind of report, identifying the test. As the
suite develops, more sophisticated reporting will be developed that fills the
needs of developers. How's that for using the word "develop" too much?
RB
∂24-Jul-86 1549 berman@vaxa.isi.edu test control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86 13:05:59 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA06785; Thu, 24 Jul 86 13:03:54 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607242003.AA06785@vaxa.isi.edu>
Date: 24 Jul 1986 1303-PDT (Thursday)
To: cfy@OZ.AI.MIT.EDU
Cc: cl-validation@su-ai.arpa
Subject: test control
Please see my message to Marick, which answers some of your questions. As for
the others:
1. Contributor string. Who wrote/contributed it.
Nice to keep around. But won't you generally have a whole bunch of tests
in a file from 1 contributor? You shouldn't have to have their name
on every test.
Nope. The tests will be separated into the various sections of the book under
which the test best fits. These will then be assembled into a test for that
section. Note also Marick's comments re regression analysis.
3. Test type. E.g. Eval, Error, Ignore, etc.
Please be more specific on what this means.
See Marick's comments.
4. N tests (or pairs of tests and expected results).
Typically how large is N? 1, 10, 100, 1000?
I imagine N is very small. It should be what you could call a "testing unit"
which does enough to conclusively report success/failure of some specific
thing being tested.
5. Side effects testing. With each test from #4 above it should be possible
to give n forms which must all evaluate to non-NIL.
Particularly for a large N, side effect testing should be textually adjcent to
whatever its affecting.
Certainly would enhance readability/maintanability, etc.
6. Test name. Unique for each test.
This should be adjacent to test-id
Sure.
7. Form to evaluate if test fails. This may be useful later to help analyze
beyond the first order.
typically NIL ? By "TEST" do you mean if one of the above N fails ,eval this form?
Should it be evaled for each of the N that fail?
Well, each thing wrapped by this macro should be a "testing unit" as above, so
if any of N fails the remaining tests in that macro probably won't be
executed, and this form will then be evaluated.
8. Error string.
Similar to above?
Not at all. This is what to say in the event of an error. It is optional
because a reporting mechanism can construct a message, but for more
readability or for other reasons (as deemed useful by the test implementor) a
canned string can be printed as well.
Above is not only ambiguous, but too abstract to get a feel for it.
Send us several examples, both typical and those at the extreme ranges of
size and complexity. I want to see the actual syntax.
Well, I hope this and other messages help that problem. As for syntax - until
it is implemented, there isn't any. If you still don't see why this data is
needed, or if it isn't clear about the "database" stuff I mentioned, please
call me.
Guessing at what you mean here, it looks like its going to take someone a very
long time to make the tests in such a complex format.
And you lose potential flexibility.
I couldn't disagree more. I have received a great deal of testing material
and this is not much more "complex" than most. It actually allows (in
conjunction with the testing database) a far more flexible testing regimen
than any I've seen.
(As for your methodology -- it has much merit. Perhaps my use of parts of it
are too disguised here?)
Programmers are very reluctant to write diagnostics, so lets try to
make it as painless as possible. Maybe there could be some
macros that would fill in certain defaults of your full-blown format.
Only new contributions need to be in this format. I would expect a wise
programmer to come up with a number of ways to automate this. I for one would
not type my company name (contributor ID) for each one.
One of the things that's so convienient about my mechanism is that
a hacker can chose to, with a normal lisp text editor, eval part of
a call, a whole call, a group of calls [by selecting the region],
a whole TEST, or via my fn "test-file" a whole file.
[I also have "test-module" functionality for a group of files.]
Having this functionality makes the diagnostics more than just
a "validation" suite. It makes it a real programming tool.
And thus it will get used more often, and the tests themselves will
get performed more often.
This will lead to MORE tests as well as MORE TESTED tests, which
also implies that hackersimplementors will have more tested implementations,
which, after all, furthers the ultimate goal of having accurate
implementations out there.
Certainly one goal is to make the tests useful. We hope to have an online
(via network) capability for testers to request their own test suites, as
customized as we can. For others, a testing file can be generated. Have you
read the ISI proposal for CL support?
.....
Before settling on a standard format, I'd also recommend just
converting a large file of tests into the proposed format
[before implementing the code that performs the test].
Am doing that now, with the CDC test suite.
This will help you feel redundancies in the format
by noticing your worn out fingers.
But it will also help you see what parts of the syntax are
hard to remember and in need of more keywords or better named
functions, or less nested parens.
You bet.
If the proposed format passes this test, it can be used as the
TEST code for the TEST software itself, as well as testing CL.
If not, you didn't waste time implementing a bad spec.
As with any large (any many smaller) systems, the test suite will go through
the various stages of incrmental development. I'm sure we'll discard a
paradigm or two on the way.
Despite the volume of my comments, I'm glad you're getting
down to substantial issues on what features to include.
CFry
Thank you.
I hope this is helpful.
RB
∂24-Jul-86 1740 FAHLMAN@C.CS.CMU.EDU FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 24 Jul 86 17:22:22 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 24 Jul 86 20:22:36-EDT
Date: Thu, 24 Jul 1986 20:22 EDT
Message-ID: <FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: berman@λvaxa.isi.edu (Richard Berman)λ
Cc: cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 24 Jul 1986 15:35-EDT from berman at vaxa.isi.edu (Richard Berman)
Maybe I should have read the earlier proposal more carefully. This
"incarnate in FSD" business sounds scary.
I had the impression that FSD was an internal tool that you would be
using to maintain the vlaidation suite, but that the validation suite
itself would be one or more Common Lisp files that you can pass out to
people who want to test their systems. Is that not true? (This is
separate from the issue of whether validation is done at ISI or
elsewhere; the point is that it should be possible to release the test
suite if that's what we want to do.) I would hope that the testing code
can be passed around without having to pass FSD around with it (unless
FSD is totally portable and public-domain).
-- Scott
∂25-Jul-86 0047 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU test control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 00:47:11 PDT
Received: from DUANE.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 40592; Fri 25-Jul-86 03:50:09-EDT
Date: Fri, 25 Jul 86 03:47 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607242003.AA06785@vaxa.isi.edu>
Message-ID: <860725034736.3.CFRY@DUANE.AI.MIT.EDU>
I apologize for not including the text of the messages I'm replying
to here. Since its more than one, I have a hard time integrating them.
.......
Sounds like you're basically doing the right stuff, but I still
don't see why you don't present us with an example.
You mentioned that you wouldn't have one until the implementation
was complete, then you said you were converting the CDC tests
already. ???
I surmise that ISI will be using some fancy database format that you'll have to
have some hairy hard and software to even get ASCII out of it.
But the interface to that will, I hope, be files containing
lisp expressions, that can be read with the reader and maybe even
tested by evaling them as is or with some modification.
Its this format that I'd like to see an example of.
There was a question about a published spec that you dodged.
I presume there will be a fixed format, and we'll all want to use it.
Since everybody is going to want to use certain "macros" for helping them
manipulate the stuff, can't we just standardize on those too?
To refer to the original issue,
when an implementor sends you a file, it should say just once
at the top of the file who wrote the tests, and what version of CL
they apply to. Actually a list of versions or range of versions may be more
apropriate.
Since it will be a smaller and less controversial amount of code, we can
just standardize on your implementation rather than haggle over
English descriptions, though I hope your implementation will at least
include doc strings. Will this code be Public Domain, or at least
given out to test contributors?
In a bunch of cases you refer to giving a test form and including
an expected value. The issue arises, how do you compare the two?
My mechanism just uses the full power of CL to do comparisons
in the most natural way. There are not 2 parts to a call,
there's just one. And the kind of comparison is integral with
the call ex: (eq foo foo)
(not (eq foo bar))
(= 1 1.0)
(equalp "foo" "FOO")
There are lots of comparisons, so don't try to special case each one.
When an error system is settled upon, I hope there will be an errorp fn.
Of course, this ends up testing "EQ" at the same time it tests "FOO",
but I think thats, in general unavoidable.
Anyway if EQ is broken, the implementation doesn't have much of a chance.
You said that each form of a group would be tested and when the first
one fails, you stop the test and declare that "REDUCE" or whatever is
broken. I think we can provide higher resolution than that without
much cost, ie (reduce x y z) is broken.
Such resolution will be very valuable to the bug fixer, and even
for someone evaluating the language. Since you dodged my
question of "How big is N" by saying "very small" instead of
1 -> 5 or whatever, I can't tell what resolution your mechanism
is really going to provide.
∂25-Jul-86 1036 berman@vaxa.isi.edu Re: FSD
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 10:36:30 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA16963; Fri, 25 Jul 86 10:35:43 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251735.AA16963@vaxa.isi.edu>
Date: 25 Jul 1986 1035-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Thu, 24 Jul 1986 20:22 EDT.
<FAHLMAN.12225351684.BABYL@C.CS.CMU.EDU>
FSD will be used to maintain a number of things relating to our support of CL.
It need not be distributed itself. The intended use is to help order and keep
track of the various tests. For example, there may be tests which are
questionable. They would be in the database, but not readilly accessable for
the purposes of making a test file until they were verified.
Yes, of course it is files that will be distributed. FSD can be used to help
create the testing files. I did note on the proposal (which I did not author)
that ISI intends to send a "team" to do the validation at the manufacturer's
site. Exactly why (except for official reporting) I don't know.
The test suite, as "incarnated" in FSD, will exist as a bunch of objects, each
of which represents a test and some data about the test. There are not really
files, as such, in FSD.
If this still sounds scary, let me know. One of the purposes of all this is
to eventually allow network access to this database (and for other purposes).
RB
∂25-Jul-86 1051 berman@vaxa.isi.edu Re: test control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 10:50:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA17112; Fri, 25 Jul 86 10:49:50 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251749.AA17112@vaxa.isi.edu>
Date: 25 Jul 1986 1049-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: test control
In-Reply-To: Your message of Fri, 25 Jul 86 03:47 EDT.
<860725034736.3.CFRY@DUANE.AI.MIT.EDU>
I sort of thought the notion of a "test unit" would communicate the "N" you
refer to. Let me be more specific. N is 1. But there may be more than one
form. N here refers to the number of tests of the function/topic being
tested. Other forms can set things up, etc. If any form fails, it is THAT
TEST that is reported to have failed, not the entirety of the function/topic.
As for the conversion -- I am mostly working with my organizing database (the
one that will be used to help order the tests) with the CDC stuff as a test
case.
I would sure like to hear more ideas, and from others too. I think now that I
would modify this testing macro a bit. I think the "test" proper is in 3
parts. A setup, the actual test form, and an un-setup. Obviously only the
test form is required.
I do somewhat like the idea of just using a lisp-form, and if it is supposed
to return some result, just ensure it returns non-nil for "OK". That is,
using your simpler (pred x y) where pred tests the result, x is the test form,
and y is the desired result. I still would like to formalize it somewhat into
something that more clearly shows which is the test form and the required
result, as well as the predicate. See some of the test classes that Marick
describes. Not all of them care for a result, and I would like that to be
more explicit from the layout of the test text.
I am sorry you feel I am being evasive. I could just make arbitrary
decisions, but in fact I am relaying all the information, ideas and activities
as they actually are.
RB
∂25-Jul-86 1111 FAHLMAN@C.CS.CMU.EDU FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 11:10:53 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 14:10:54-EDT
Date: Fri, 25 Jul 1986 14:10 EDT
Message-ID: <FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: berman@λvaxa.isi.edu (Richard Berman)λ
Cc: cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986 13:35-EDT from berman at vaxa.isi.edu (Richard Berman)
That all sounds fine, as long as you people at ISI are able to cause FSD
to create a file that represents a portable test suite with the
parameters you specify (version of Common Lisp, what areas tested, etc.)
If people can come in over the net and produce such portable files for
their own use, so much the better.
-- Scott
∂25-Jul-86 1127 berman@vaxa.isi.edu Re: FSD
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 11:23:13 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA17533; Fri, 25 Jul 86 11:22:38 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607251822.AA17533@vaxa.isi.edu>
Date: 25 Jul 1986 1122-PDT (Friday)
To: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986 14:10 EDT.
<FAHLMAN.12225546120.BABYL@C.CS.CMU.EDU>
That's my feeling too. By the way, when you say "versions of common lisp",
just what do you mean? Are there officially recognized versions? Or is all
ongoing activity still towards a version 1?
Thanks.
RB
∂25-Jul-86 1254 FAHLMAN@C.CS.CMU.EDU FSD
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 12:54:23 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Fri 25 Jul 86 15:54:18-EDT
Date: Fri, 25 Jul 1986 15:54 EDT
Message-ID: <FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: berman@λvaxa.isi.edu (Richard Berman)λ
Cc: cl-validation@SU-AI.ARPA
Subject: FSD
In-reply-to: Msg of 25 Jul 1986 14:22-EDT from berman at vaxa.isi.edu (Richard Berman)
The assumption is that once we have ANSI/ISO approval for one version,
there will be updates to the standard at periodic and not-too-frequent
intervals.
-- Scott
∂25-Jul-86 1541 berman@vaxa.isi.edu Re: FSD
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Jul 86 15:40:42 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA20104; Fri, 25 Jul 86 15:39:31 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607252239.AA20104@vaxa.isi.edu>
Date: 25 Jul 1986 1539-PDT (Friday)
To: Fahlman@C.CS.CMU.EDU
Cc: cl-validation@SU-AI.ARPA
Subject: Re: FSD
In-Reply-To: Your message of Fri, 25 Jul 1986 15:54 EDT.
<FAHLMAN.12225564981.BABYL@C.CS.CMU.EDU>
Thanks, that clears it up for me.
RB
∂26-Jul-86 1447 marick%turkey@gswd-vms.ARPA Test suite
Received: from GSWD-VMS.ARPA by SU-AI.ARPA with TCP; 26 Jul 86 14:47:39 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
id AA15192; Sat, 26 Jul 86 16:49:06 CDT
Message-Id: <8607262149.AA15192@gswd-vms.ARPA>
Date: Sat, 26 Jul 86 16:49:02 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu
Cc: cl-validation@su-ai.arpa
In-Reply-To: berman@vaxa.isi.edu's message of 24 Jul 1986 1235-PDT (Thursday)
Subject: Test suite
Equality predicates (mostly digression on test-case syntax):
In any test, you'll have to write down the test case, the expected
results, and the way you test the expected vs. actual results.
The obvious way to do it is
(eq (car '(a b c)) 'a)
The way we do it (a way derived from something the DEC people put in
this mailing list a long time ago) is
( (car '(a b c)) ==> a)
Where the match predicate is implicit (EQUAL). I like this way better
because it breaks a test down into distinct parts. That makes it
easier, for example, to print an error message like
"Test failed with actual result ~A instead of expected result ~A~%".
If a test is just a lisp form, it will usually look like
(<match-pred> <test-case> <expected-results>), but "usually" isn't enough.
Once you've got test-forms broken down into separate parts, it just
turns out to be convenient to have one of the parts be the match
function and another to be the type of the test (evaluating,
non-evaluating, error-expecting, etc.)
Compilation:
I wouldn't put off worrying about issues surrounding compilation.
We did just that, and I'm not pleased with the result. These issues
will affect the whole structure of the test driver, I think, and
ignoring them will, I fear, either lead to throwing away the first
version or living with inadequacy.
∂28-Jul-86 1122 berman@vaxa.isi.edu Re: Test suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 28 Jul 86 11:21:23 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA07088; Mon, 28 Jul 86 11:19:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607281819.AA07088@vaxa.isi.edu>
Date: 28 Jul 1986 1119-PDT (Monday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa
Subject: Re: Test suite
In-Reply-To: Your message of Sat, 26 Jul 86 16:49:02 CDT.
<8607262149.AA15192@gswd-vms.ARPA>
I agree about making the testing predicate a separate part of the test form.
This may become more useful for both analysis and test generation at some
point.
As for compilation -- in the test managers I have received, one generally has
the option of running the tests interpreted, compiled, or both. There is not
a compile-file option as yet. I suspect that compile-file should be its own
test, rather than a form of testing. That is, there will undoubtably be a
mini-suite for testing just compile-file. As well, there should be a general
sub-suite for testing all forms of compilation. While it is ad-hoc to test
the compiler by compiling tests not intended to test the compiler, I freely
admit that more subtle bugs are likely to be revealed in this manner for the
very reason that the tests were not intended specifically for compilation.
Also, there are implementations that only compile, such as ExperLisp.
RB
∂29-Jul-86 1220 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU Re: test control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86 10:34:18 PDT
Received: from MACH.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 41040; Tue 29-Jul-86 03:32:12-EDT
Date: Tue, 29 Jul 86 03:31 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: test control
To: berman@vaxa.isi.edu, cfry@OZ.AI.MIT.EDU
cc: cl-validation@SU-AI.ARPA
In-Reply-To: <8607251749.AA17112@vaxa.isi.edu>
Message-ID: <860729033120.1.CFRY@MACH.AI.MIT.EDU>
I sort of thought the notion of a "test unit" would communicate the "N" you
refer to. Let me be more specific. N is 1. But there may be more than one
form. N here refers to the number of tests of the function/topic being
tested. Other forms can set things up, etc. If any form fails, it is THAT
TEST that is reported to have failed, not the entirety of the function/topic.
sounds good.
I would sure like to hear more ideas, and from others too. I think now that I
would modify this testing macro a bit. I think the "test" proper is in 3
parts. A setup, the actual test form, and an un-setup. Obviously only the
test form is required.
I usually consider un-setup to be part of the "setup".
Say if a test does (setq foo), then the next test is testing
whether boundp works, As part of the setup I would do (makunbound 'foo).
This means that the current test will not have to rely on everybody else doing
the un-setup properly, which is probably what you have to do anyway.
If all of the unsetups work correctly, then the env should be the same before the
test as it is after, right? This is an awful lot of work your cutting out for yourself.
My proposals in general take into heavy consideration making it easy to write tests,
and making a minimal amount of the system the tests diagnostic controlling program itself
work with just a minimal amount of lisp functioning. It sounds like you're
not operating under the same constraints, but users of the validation suite will be.
I do somewhat like the idea of just using a lisp-form, and if it is supposed
to return some result, just ensure it returns non-nil for "OK". That is,
using your simpler (pred x y) where pred tests the result, x is the test form,
and y is the desired result. I still would like to formalize it somewhat into
something that more clearly shows which is the test form and the required
result, as well as the predicate. See some of the test classes that Marick
describes. Not all of them care for a result, and I would like that to be
more explicit from the layout of the test text.
Ok, I recognize that its nice to be able to find out the various parts of the test,
rather than just have this amorphous lisp exporession that's suppose to return non-nil.
Here's a modified approach that I think will satisfy both of us.
A cheap tester can just evaluate the test and expect to get non-nil.
Most forms will be of the type (pred expression expected-value).
That's pretty simple to parse for error messages and such.
For the don't-care-about-value case, have a function called:
ignore-value.
(defun ignore-value (arg)
(eval arg)
t)
If you really need to get explicit, have a function called:
make-test
A call looks like:
(make-test pred exp expected-value
&key test-id author site-name set-upform un-setup-form error-message compilep ...)
make-test is not quite the right word, because I think evaling it would
perform the test, not just create it. Maybe we should call it
perform-test instead.
If you realy want to give atest a name, there could be a fn
def-test whose args are the same as make-test expect that inserted at
the front is a NAME.
Anyway the idea is that some hairy database program
can easily go into the call and extract out all the relevent info.
[actually, its not even so hairy:
-setup and unsetup default to NIL.
-if non-list, pred defaults to EQUAL, expected-value defaults to non-nil
-if list, whose car is not DEF-TEST, pred is car,
exp is cadr and expected-value is caddr.
-if list whose car is DEF-TEST, parse as is obvious.]
But some simple program can just run it and it'll do mostly what you want.
the &key args can have appropriate defaults like *site-name* and
*test-author-name*.
My point here is lets use the lisp reader and evaluator, not construct
a whole new language with its own syntax with "==>" infix operators,
special names for predicates that duplicate existing cl fns, and such.
Lisp is hip! That's why we're bothering to implement it in the first place!
As for explicit error mesages, using:
"The form ~s evaled to ~s but the expected value was ~s."
Seems pretty complete to me. Nothing in my new proposal makes it hard to
implement such an error message.
I am sorry you feel I am being evasive. I could just make arbitrary
decisions, but in fact I am relaying all the information, ideas and activities
as they actually are.
Thanks for your concern. Actually I didn't think you were trying to be evasive,
its just that you didn't think that designing the syntax can often simplify
homing in on the exact functionality of the program.
.....
I haven't thought very hard about being able to use the
same test for both compiling and evaling the expression in question.
I agree with whoever said that this should be worked out.
In my above make-test call, I have a var for compilep.
This could take the values T, NIL, or :BOTH, and maybe even
default to :BOTH.
∂29-Jul-86 1629 berman@vaxa.isi.edu Add to list
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 29 Jul 86 11:11:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA17440; Tue, 29 Jul 86 11:11:33 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607291811.AA17440@vaxa.isi.edu>
Date: 29 Jul 1986 1111-PDT (Tuesday)
To: CL-Validation@SU-AI.ARPA
Cc: Cornish%bravo@ti-csl@CSNET-RELAY.ARPA
Subject: Add to list
I am forwarding the message here I received to the correct person.
RB
------- Forwarded Message
Return-Path: <CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA>
Received: from CSNET-RELAY.ARPA (csnet-pdn-gw.arpa) by vaxa.isi.edu (4.12/4.7)
id AA11007; Mon, 28 Jul 86 17:09:37 pdt
Received: from ti-csl by csnet-relay.csnet id ar02252; 28 Jul 86 19:56 EDT
Received: from Bravo (bravo.ARPA) by tilde id AA12392; Mon, 28 Jul 86 17:08:11 cdt
To: berman@vaxa.isi.edu
Cc:
Subject: CL Validation Mailing List
Date: 28-Jul-86 17:05:11
From: CORNISH%Bravo%ti-csl.csnet@CSNET-RELAY.ARPA
Message-Id: <CORNISH.2731961109@Bravo>
I would like to be added to the CL Validation Suite mailing list.
------- End of Forwarded Message
∂31-Jul-86 0834 marick%turkey@gswd-vms.ARPA Lisp conference
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 31 Jul 86 08:34:35 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
id AA00287; Thu, 31 Jul 86 10:34:01 CDT
Message-Id: <8607311534.AA00287@gswd-vms.ARPA>
Date: Thu, 31 Jul 86 10:33:56 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Lisp conference
Several people interested in CL validation will be at the Lisp
conference. Perhaps it would be a good idea if Richard Berman were to
buy us all lunch. Failing that, perhaps we should go to lunch on our
own tab -- or othertimewise get together.
Brian Marick
∂31-Jul-86 1034 berman@vaxa.isi.edu Re: Lisp conference
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 31 Jul 86 10:34:39 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA05383; Thu, 31 Jul 86 10:33:44 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8607311733.AA05383@vaxa.isi.edu>
Date: 31 Jul 1986 1033-PDT (Thursday)
To: marick%turkey@gswd-vms.ARPA (Brian Marick)
Cc: cl-validation@su-ai.arpa, berman@vaxa.isi.edu
Subject: Re: Lisp conference
In-Reply-To: Your message of Thu, 31 Jul 86 10:33:56 CDT.
<8607311534.AA00287@gswd-vms.ARPA>
As for Richard Berman buying Lunch - I don't know how ISI would feel about
that, but I'll check. I am trying to prune my stay to one day, so which
should it be. I really need to know by today if possible, or friday morning
at worst. Based on the responses of those interested in the validation
effort, I will decide how long (and which day(s)) to stay.
So when would y'all like to get together?
Best,
RB
∂01-Aug-86 1348 berman@vaxa.isi.edu Conference
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 1 Aug 86 13:48:06 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA15784; Fri, 1 Aug 86 13:47:46 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608012047.AA15784@vaxa.isi.edu>
Date: 1 Aug 1986 1347-PDT (Friday)
To: cl-validation@su-ai.arpa
Cc:
Subject: Conference
Hey gang, I'm going to be at the conference to meet with any and all parties
interested in the Validation effort. I may only be around on Monday (but
Tuesday is a possibility) and I would like to meet for lunch after the morning
session. I assume I'll be wearing some kind of ID badge to identify myself as
Richard Berman from ISI.
I'll bring along a few hardcopies of the ISI proposal outlining our intended
support activities.
I really would like to meet everyone who is working on testing implementations
and other issues like this.
See ya.
RB
∂11-Aug-86 1122 berman@vaxa.isi.edu Thanks
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 11 Aug 86 11:22:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA02567; Mon, 11 Aug 86 11:23:02 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608111823.AA02567@vaxa.isi.edu>
Date: 11 Aug 1986 1122-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: Thanks
Thanks to the folks I spoke with at the conference. The main thing I got from
this is the concept of an ordering macro to facilitate test groups which must
execute in a specific sequence.
I would like to know if there is any more commentary, questions, suggestions,
etc. regarding the test macro?
RB
∂13-Aug-86 1130 berman@vaxa.isi.edu Test Control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 13 Aug 86 11:29:52 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA10629; Wed, 13 Aug 86 11:30:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608131830.AA10629@vaxa.isi.edu>
Date: 13 Aug 1986 1130-PDT (Wednesday)
To: cl-validation@su-ai.arpa
Cc:
Subject: Test Control
On 29 July Fry proposed a control scheme including a "compilep" option which
would be T, Nil or :BOTH, possibly defaulting to :BOTH. This would be present
for each test.
I feel that this is unnecessary because Common Lisp is supposed to yield the
same results compiled or interpreted. At least, that is my understanding. Is
there any intentional instances where this is not true?
Each test (or ordered series of tests) should be runnable in either form, so I
believe the control for testing compilation should be more global.
What do you think?
RB
∂19-Aug-86 0039 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU Test Control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86 00:39:39 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43318; Tue 19-Aug-86 03:41:06-EDT
Date: Tue, 19 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Test Control
To: berman@vaxa.isi.edu, cl-validation@SU-AI.ARPA
In-Reply-To: <8608131830.AA10629@vaxa.isi.edu>
Message-ID: <860819034224.7.CFRY@JONES.AI.MIT.EDU>
On 29 July Fry proposed a control scheme including a "compilep" option which
would be T, Nil or :BOTH, possibly defaulting to :BOTH. This would be present
for each test.
I feel that this is unnecessary because Common Lisp is supposed to yield the
same results compiled or interpreted. At least, that is my understanding. Is
there any intentional instances where this is not true?
Well, modulo some recent debate, macro-expand time is different.
Effectively, macro-expand time for compiled functions is the same as definition time.
But for evaled fns, macro-expand time is the same as run time.
But basically you're right. So long as we can easily run a whole set of tests
either evaled, compiled, or both, we don't need to indicate that in each test.
The error messages should definitely say wheather the call failed in compiled or
evaled mode.
Each test (or ordered series of tests) should be runnable in either form, so I
believe the control for testing compilation should be more global.
What do you think?
RB
In my diagnostic system, I'd like to have the local control.
One reason is so that I can explicitely label a test that has a bug in it.
[and maybe only the compiled version of a call would have the bug.]
If there was a convienient syntax for declaring a test
evaled, compiled, both, or under global control [with global control being the default,
and with BOTH being the global-control's default]
then I'd make use of it.
∂19-Aug-86 1135 berman@vaxa.isi.edu Re: Test Control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86 11:35:02 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA12279; Tue, 19 Aug 86 11:35:26 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608191835.AA12279@vaxa.isi.edu>
Date: 19 Aug 1986 1135-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Subject: Re: Test Control
In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
<860819034224.7.CFRY@JONES.AI.MIT.EDU>
Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
specification of compiled, evaled or both (for testing), I like it.
I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
:COMPILE or :EVAL, where :GLOBAL means that the global test controller will
decide whether the test is compiled and/or evald, and the other two values are
a "compile only" or "eval only" specifier, overriding the global control. I
don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
meaning that the test may be compiled and/or evaled.
NOTE: I am experimenting with a macro now that includes all the best features
we have seemed to agree upon. I am including the above feature, but naturally
it can be changed. In a few days I will post this preliminary macro. It is
not really a control macro, but simply defines the test in terms of the data
base. Currently I am using generic common-lisp for this organizing macro, and
I am not using FSD. Instead it creates a simpler database using lists, arrays
and property lists. This database is for testing only and the actual
organizing macro may stray from pure CL because it is intended for internal
use only. Of course, the files generated from the database will contain only
"pure" CL for testing purposes.
RB
∂20-Aug-86 0604 hpfclp!hpfcjrd!diamant@hplabs.HP.COM Re: Test Control
Received: from HPLABS.HP.COM by SAIL.STANFORD.EDU with TCP; 20 Aug 86 06:03:39 PDT
Received: by hplabs.HP.COM ; Wed, 20 Aug 86 04:43:35 pdt
From: John Diamant <hpfclp!hpfcjrd!diamant@hplabs.HP.COM>
Received: from hpfcjrd.UUCP; Tue, 19 Aug 86 13:26:12
Received: by hpfcjrd; Tue, 19 Aug 86 13:26:12 mdt
Date: Tue, 19 Aug 86 13:26:12 mdt
To: cl-validation@sail.stanford.edu
Subject: Re: Test Control
> Subject: Test Control
> From: Christopher Fry <hplabs!cfry@OZ.AI.MIT.EDU>
>
> Well, modulo some recent debate, macro-expand time is different.
> Effectively, macro-expand time for compiled functions is the same as definition time.
> But for evaled fns, macro-expand time is the same as run time.
For evaled functions, it is unspecified in Common Lisp. This has been
discussed at great length on the CL mailing list, so I won't repeat it here,
but this is a potential source for problems in test runs. If an implementation
chooses to handle macro expansion the way you suggest (most do), then the
semantics truly are different. On our implementation, where we chose to
have consistent interpreter and compiler semantics with regard to
macroexpansion, any problems we encountered with expansion time were the same
whether we ran interpreted or compiled.
John Diamant
Systems Software Operation UUCP: {ihnp4!hpfcla,hplabs}!hpfclp!diamant
Hewlett Packard Co. ARPA/CSNET: diamant%hpfclp@hplabs.HP.COM
Fort Collins, CO
∂21-Aug-86 1352 berman@vaxa.isi.edu Purpose of Test Suite
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86 13:52:46 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA29758; Thu, 21 Aug 86 13:53:10 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608212053.AA29758@vaxa.isi.edu>
Date: 21 Aug 1986 1353-PDT (Thursday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: Purpose of Test Suite
Now that I have been experimenting a bit, I have come up against a question
that is a bit difficult to decide upon. From my understanding, I am putting
together a VALIDATION suite, the purpose of which is to determine the presence
operating status of all the CL functions, variables, features, etc.
Is it also supposed to thoroughly test these things?
That is, is this same suite responsible for determining such things as correct
operation at boundary conditions? How about esoteric interactions?
In the test of the "+" operation, what would you include? Obviously you want
to be sure that it works for each data type (and combination of data types)
that it is defined for. Also you want to make sure that positive/negative is
handled, etc. Beyond that, should it also check to see if, for example,
MOST-POSITIVE-FIXNUM + 1 causes an error? How about (+ 1 (1-
MOST-POSITIVE-FIXNUM)) causes no error? And so on for each of the number-type
boundaries.
RB
∂21-Aug-86 1738 FAHLMAN@C.CS.CMU.EDU Purpose of Test Suite
Received: from C.CS.CMU.EDU by SAIL.STANFORD.EDU with TCP; 21 Aug 86 17:38:45 PDT
Received: ID <FAHLMAN@C.CS.CMU.EDU>; Thu 21 Aug 86 20:37:13-EDT
Date: Thu, 21 Aug 1986 20:37 EDT
Message-ID: <FAHLMAN.12232694383.BABYL@C.CS.CMU.EDU>
Sender: FAHLMAN@C.CS.CMU.EDU
From: "Scott E. Fahlman" <Fahlman@C.CS.CMU.EDU>
To: berman@λvaxa.isi.edu (Richard Berman)λ
Cc: CL-Validation@SU-AI.ARPA
Subject: Purpose of Test Suite
In-reply-to: Msg of 21 Aug 1986 16:53-EDT from berman at vaxa.isi.edu (Richard Berman)
I agree that this is supposed to be a validation suite, and not a
comprehensive debugging suite. It should test that everything is there,
that it basically all works, and should especially stress those things
that might be the subject of misunderstandings. It is necessary to test
whether you can add a flonum to a bignum; it is not necessary to
test a few thousand pairs of random integers to make sure that the +
operator works for all of them.
-- Scott
∂22-Aug-86 0124 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU Purpose of Test Suite
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86 01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43711; Fri 22-Aug-86 01:50:01-EDT
Date: Fri, 22 Aug 86 01:49 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Purpose of Test Suite
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608212053.AA29758@vaxa.isi.edu>
Message-ID: <860822014957.9.CFRY@JONES.AI.MIT.EDU>
Now that I have been experimenting a bit, I have come up against a question
that is a bit difficult to decide upon. From my understanding, I am putting
together a VALIDATION suite, the purpose of which is to determine the presence
operating status of all the CL functions, variables, features, etc.
Is it also supposed to thoroughly test these things?
If there's much of a difference, we're in big trouble.
If somebody's implementation supports adding of all
integers except (+ 27491 -31200001), we can't be expected to find that out with the
validation suite.
That is, is this same suite responsible for determining such things as correct
operation at boundary conditions? How about esoteric interactions?
In the test of the "+" operation, what would you include? Obviously you want
to be sure that it works for each data type (and combination of data types)
that it is defined for. Also you want to make sure that positive/negative is
handled, etc. Beyond that, should it also check to see if, for example,
MOST-POSITIVE-FIXNUM + 1 causes an error? How about (+ 1 (1-
MOST-POSITIVE-FIXNUM)) causes no error? And so on for each of the number-type
boundaries.
I think the broader question you're asking is:
Should the validation suite simply test that things work the way they're supposed to
when they're suppose to, or should it also make sure that things DON'T WORK when they're
not suppose to work.
You can obviously expand either catagory to available memory.
For + on non-negative integers, I'd test:
(+)
(+ 0)
(+ 0 0)
(+ 2 3 4 5 6 7)
(+ nil) => should error
(+ "one") => another error case wouldn't hurt
Checking the cases using most-positive-fixnum is
a good idea and does appear to be necessary.
It's a lot of nit-picking work, though.
I'm glad I'm in MY sandals.
∂22-Aug-86 0125 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU Re: Test Control
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86 01:24:29 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 43710; Fri 22-Aug-86 01:39:48-EDT
Date: Fri, 22 Aug 86 01:39 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: Re: Test Control
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608191835.AA12279@vaxa.isi.edu>
Message-ID: <860822013939.8.CFRY@JONES.AI.MIT.EDU>
Received: from MC.LCS.MIT.EDU by OZ.AI.MIT.EDU via Chaosnet; 19 Aug 86 14:51-EDT
Received: from SAIL.STANFORD.EDU by MC.LCS.MIT.EDU 19 Aug 86 14:48:11 EDT
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 19 Aug 86 11:35:02 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA12279; Tue, 19 Aug 86 11:35:26 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608191835.AA12279@vaxa.isi.edu>
Date: 19 Aug 1986 1135-PDT (Tuesday)
To: CL-Validation@su-ai.arpa
Subject: Re: Test Control
In-Reply-To: Your message of Tue, 19 Aug 86 03:42 EDT.
<860819034224.7.CFRY@JONES.AI.MIT.EDU>
Re: Fry's idea of having a flag for GLOBAL/LOCAL control, with LOCAL allowing
specification of compiled, evaled or both (for testing), I like it.
I suggest that the flag be a keyword called :CONTROL with the values :GLOBAL,
:COMPILE or :EVAL, where :GLOBAL means that the global test controller will
decide whether the test is compiled and/or evald, and the other two values are
a "compile only" or "eval only" specifier, overriding the global control.
Almost right.
I
don't think that :BOTH is necessary as this seems to be identical to :GLOBAL,
meaning that the test may be compiled and/or evaled.
Nope. :GLOBAL should mean, get the kind of testing from the global variable
*global-test-kind* which make take on the values:
:eval, :compile, or :both.
The question is, should the local version be able to say :compile when the global
version says :eval and visa-versa?
Maybe in that case, that test would simply not get run.
[Say, something that only works compiled, and you're running all the tests
knowing that the compiler is completely broken, so don't run any compioed tests.]
Maybe GLOBAL should have precidence?
I know you say everything should work under compiled and evaled and for
strickly VALIDATION purposes, you shouldn't need any of this.
But it would be useful if the same format for validation was
useful for code development. For one thing, it would simply get used
more and we'd get more validation tests.
For another, it would help developers.
NOTE: I am experimenting with a macro now that includes all the best features
we have seemed to agree upon. I am including the above feature, but naturally
it can be changed. In a few days I will post this preliminary macro. It is
not really a control macro, but simply defines the test in terms of the data
base. Currently I am using generic common-lisp for this organizing macro, and
I am not using FSD.
Right on!
Instead it creates a simpler database using lists, arrays
and property lists. This database is for testing only and the actual
organizing macro may stray from pure CL because it is intended for internal
use only. Of course, the files generated from the database will contain only
"pure" CL for testing purposes.
sounds good.
∂22-Aug-86 1054 berman@vaxa.isi.edu Re: Test Control
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 22 Aug 86 10:54:33 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA06074; Fri, 22 Aug 86 10:54:47 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608221754.AA06074@vaxa.isi.edu>
Date: 22 Aug 1986 1054-PDT (Friday)
To: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Cc: CL-Validation@SU-AI.ARPA
Subject: Re: Test Control
In-Reply-To: Your message of Fri, 22 Aug 86 01:39 EDT.
<860822013939.8.CFRY@JONES.AI.MIT.EDU>
I am still not sure why :BOTH is needed. I beleive that the purpose here is
to have individual tests be able to specify a limitation on how they may be
run. Obviously the vast majority of tests can be run either :COMPILEd or
:EVALed. It is only the rare test that must limit this with the inclusion of
a :EVAL or :COMPILE option. I recommend changing these names to :EVAL-ONLY
and :COMPILE-ONLY to clarify the meanings.
The test controller could be told to run every test compile, evaled, or both.
Perhaps it would be useful to also say "run only the EVAL-ONLY tests", etc.
Does this seem useful? If not, please clarify for me just how :BOTH is
different from the union of :EVAL and :COMPILE.
Thanks
RB
∂24-Aug-86 1940 marick%turkey@gswd-vms.ARPA Purpose of Test Suite
Received: from GSWD-VMS.ARPA by SAIL.STANFORD.EDU with TCP; 24 Aug 86 19:39:54 PDT
Received: from turkey.GSD (turkey.ARPA) by gswd-vms.ARPA (5.51/)
id AA10093; Sun, 24 Aug 86 21:39:51 CDT
Message-Id: <8608250239.AA10093@gswd-vms.ARPA>
Date: Sun, 24 Aug 86 21:40:22 CDT
From: marick%turkey@gswd-vms.ARPA (Brian Marick)
To: berman@vaxa.isi.edu, cl-validation@su-ai.arpa
Subject: Purpose of Test Suite
The validation suite should check that a Common Lisp system adheres to
the letter of the specification. I don't see that that's particularly
different from any test suite.
Of course, you quickly run into combinatorial explosion, so you have to
narrow your scope. Checking boundary conditions is known to be an
awfully productive way of testing, both because programmers often make
errors around boundaries and also because boundary condition tests can
be written quickly, without much thought.
Once the next version of the CL definition is available, it might be
useful to use it to drive the test suite. I could see something like
this:
Each "unit" of specification would contain a pointer to the appropriate
test. For example, the specification for #'+ will say that it takes 0
or more arguments. That sentence will point to a test that gives #'+
0 arguments and Lambda-Parameters-Limit arguments (the boundary
conditions). The FSD database ought to be able to support this.
It might also be useful to have a list of stock values to use for
testing. Each datatype contains classes of "equivalent values", and
these stock values would be the boundary values. For example, the stock
values for type fixnum would be most-negative-fixnum, -1, 0, +1, and
most-positive-fixnum. In some string tests I whipped off not too long
back, I used three stock strings: a simple-string, a string with one
level of displacement, a string with two levels of displacement,
including a displacement offset and a fill-pointer. (Guess what I was
testing.) These stock values have the advantage that they eliminate
some of the thinking required per test. The disadvantage is that they
institutionalize gaps in your test coverage.
I don't know that this is practical at this late date.
Brian Marick
∂25-Aug-86 1221 berman@vaxa.isi.edu TEST MACRO
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86 12:20:45 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA02269; Mon, 25 Aug 86 12:21:34 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251921.AA02269@vaxa.isi.edu>
Date: 25 Aug 1986 1221-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: TEST MACRO
Here is the current version of the test macro stuff. Note that this is an
organizing macro, to create the database. The variable LIST-OF-ITEMS is not
defined here - it contains a listing of all the CL function, macro, variable
names, etc.
I am not 100% happy with the current version, and I look forward to your
suggestions. Remember, this creates a data base. The main requisite is that
this macro must embody all the necessary info for the management and running
of the tests. My next message will contain some samples.
;; -*- Mode:Common-Lisp; Base: 10; Package:cl-tests -*-
(in-package 'cl-tests)
(defvar *list-of-test-names* nil)
(defvar *list-of-test-seq-names* nil)
; ADD-TEST does the work of putting the test into the database.
; It doesn not do any testing.
; NOTE: This version is for testing. It doesn not use FSD,
; but should work in any Common lisp. See DEFTEST for
; descriptions of the arguments.
(defmacro add-test (item name type contrib$ setup testform unsetup
failform error$ doc$ name-add control)
(putprop name item 'test-of) ; note what it is a test of.
(putprop name type 'test-type)
(putprop name contrib$ 'test-contributor)
(putprop name setup 'test-setup)
(putprop name testform 'test-form)
(putprop name unsetup 'test-unsetup)
(putprop name failform 'test-failform)
(putprop name error$ 'test-error$)
(putprop name doc$ 'test-doc$)
(putprop name control 'test-control)
(and name-add
(putprop item (cons name (get item 'tests)) 'tests)
(push name *list-of-test-names*))
`',name)
; DEFTEST is used to define a test. It puts the test into a database.
; The arguments are:
; ITEM which is one of the common lisp function names, variables, macro names,
; etc. or a subject name. The name must be present in the organizing
; database.
; NAME must be a unique symbol for this test.
; TYPE is optional, defaulting to ORDINARY. It must be one of NOEVAL,
; EVAL or ERROR. ORDINARY means the testform eval section is
; evaluated and compared (using the indicated compare in the testform)
; with the unevaluated compare section. EVAL means both halves
; are evaluated and compared. ERROR means the form should produce
; an error.
; TESTFORM is the test form, composed of 1 or 3 parts. If this is
; and ERROR test, TESTFORM is an expresion which must produce
; an error. Otherwise there are 3 parts. The first is the
; eval form, which is evaluated. The second is a form which
; can be used as a function by APPLY, taking two arguments and
; used to compare the results of the eval form with the third
; part of the TESTFORM, the compare form. The compare form is
; either evalutated (type EVAL) or not (type NOEVAL).
; The remaining arguments are optional, referenced by keywords. They are:
; :CONTRIB$ is a documentation string showing the originator of the test.
; If unspecified or NIL it gets its value from CL-TESTS:*CONTRIB$*
; :FAILFORM is a form to evaluate in the event that an unexpected error
; was generated, or the comparison failed.
; :ERROR$ is a string to print out if the comparison fails.
; :SETUP is a form to evaluate before TESTFORM.
; :UNSETUP is a form to evaluate after TESTFORM.
; :DOC$ is a string documenting this test. If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*
; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE. If it is :GLOBAL,
; it means that the test controller will decide when/if to eval and
; compile the test. If it is :EVAL, then the test will ignore
; controller attempts to compile it, and if it is :COMPILE the
; controller cannot eval it. The default is :GLOBAL.
(defvar *CONTRIB$* nil)
(defvar *DOC$* nil)
(defmacro DEFTEST ((item name &optional (type 'noeval)) testform
&key (contrib$ *contrib$*) (failform nil) (error$ nil) (setup nil)
(unsetup nil) (doc$ nil) (name-add t)(control :GLOBAL))
(cond ((null(memq item list-of-items))
(error "'~s' is not a CL item or subject.~%" item))
((null(memq type '(noeval eval error)))
(error "The test-type ~s is not one of NOEVAL, EVAL or ERROR."))
((null(stringp contrib$))
(error "The contributor, ~s, must be a string." contrib$))
((null(or (null error$) (stringp error$)))
(error ":ERROR$ must be a string."))
((null (or (null doc$) (stringp doc$)))
(error ":DOC$ must be a string."))
((null (memq control '(:GLOBAL :EVAL :COMPILE)))
(error ":CONTROL must be one of :GLOBAL, :EVAL or :COMPILE."))
((memq name *list-of-test-names*)
(error "The test name ~s has already been used!" name)))
`(add-test ,item ,name ,type ,*contrib$*
,setup ,testform ,unsetup ,failform ,error$
,(or doc$ *doc$*) ,name-add ,control)) ; put it on the item.
; The format for test sequences is:
; (DEFTEST-SEQ (item seq-name)
; (((test-name <type>) testform <key-word data>)
; ((test-name <type>) testform <key-word data>) ... )
; :CONTRIB$ <contributor-string>
; :SETUP <setup form>
; :UNSETUP <unsetup form>
; :DOC$ <documentation string>
(defmacro add-test-seq (item seq-name test-names contrib$ setup unsetup doc$)
(putprop seq-name item 'test-seq-of)
(putprop seq-name contrib$ 'test-seq-contributor)
(putprop seq-name setup 'test-seq-setup)
(putprop seq-name test-names 'test-seq-names)
(putprop seq-name unsetup 'test-seq-unsetup)
(putprop seq-name doc$ 'test-seq-doc$)
(putprop item (nconc (get item 'test-seqs) (list seq-name)) 'test-seqs)
(push seq-name *list-of-test-seq-names*)
`',seq-name)
(defmacro add-1-seq (item a-test contrib$)
`(deftest (,item ,@ (car a-test))
,(second a-test)
:contrib$ , contrib$
,@ (cddr a-test)
:name-add nil))
(defmacro DEFTEST-SEQ ((item seq-name) test-seq
&key (contrib$ *contrib$*) (setup nil) (unsetup nil) (doc$ *doc$*))
(cond ((null(memq item list-of-items))
(error "'~s' is not a CL item or subject.~%" item))
((null(stringp contrib$))
(error "The contributor must be a string."))
((null (or (null doc$) (stringp doc$)))
(error ":DOC$ must be a string."))
((memq seq-name *list-of-test-seq-names*)
(error "The test-sequence name ~s has already been used!" seq-name)))
(let (test-names)
(dolist (a-test test-seq)
(setq test-names
(nconc test-names
(list (eval `(add-1-seq ,item ,a-test ,contrib$))))))
`(add-test-seq ,item
,seq-name
,test-names
,contrib$
,setup
,unsetup
,doc$)))
∂25-Aug-86 1225 berman@vaxa.isi.edu Test-Macro examples
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86 12:24:41 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA02290; Mon, 25 Aug 86 12:25:39 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251925.AA02290@vaxa.isi.edu>
Date: 25 Aug 1986 1225-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: Test-Macro examples
Here are some samples. They are transliterated from the CDC test suite, so
please, no flames over content.
;; -*- Mode:Common-Lisp; Base: 10; Package:cl-tests -*-
(in-package 'cl-tests)
;*******************************************************************
;; ACONS test.
(setq *contrib$* "CDC. Test case written by Richard Hufford.")
(setq *doc$* nil)
(deftest
(acons acons-1)
((acons 'frog 'amphibian nil) equal (frog . amphibian))
:doc$ "ACONS to NIL")
(deftest
(acons acons-2)
((acons 'frog
'amphibian
'((duck . bird)(goose . bird)(dog . mammal)))
equal
(frog . amphibian)(duck . bird)(goose . bird)(dog . mammal))
:doc$ "acons to a-list")
(deftest
(acons acons-3)
((acons 'frog nil nil) equal ((frog)))
:doc$ "acons nil datum")
(deftest
(acons acons-4)
((acons 'frog
'(amphibian warts webbed-feet says-ribbet)
nil)
equal
((frog . (amphibian warts webbed-feet says-ribbet))))
:doc "acons with list datum")
;*******************************************************************
;; ACOSH test.
(deftest-seq
(acosh cdc-acosh-tests)
(((acosh-1)
((ACOSH 1.0000) ACOSH-P 0.0000))
((acosh-2)
((ACOSH 1.0345) ACOSH-P 0.26193))
((acosh-3)
((ACOSH 1.1402) ACOSH-P 0.5235))
((acosh-4)
((ACOSH 1.3246) ACOSH-P 0.7854))
((acosh-5)
((ACOSH 1.6003) ACOSH-P 1.0472))
((acosh-6)
((ACOSH 1.9863) ACOSH-P 1.3090))
((acosh-7)
((ACOSH 2.5092) ACOSH-P 1.5708))
((acosh-8)
((ACOSH 3.2051) ACOSH-P 1.8326))
((acosh-9)
((ACOSH 4.1219) ACOSH-P 2.0944))
((acosh-10)
((ACOSH 5.3228) ACOSH-P 2.3562))
((acosh-11)
((ACOSH 6.8906) ACOSH-P 2.6180))
((acosh-12)
((ACOSH 8.9334) ACOSH-P 2.8798))
((acosh-13)
((ACOSH 11.5920) ACOSH-P 3.1416))
((acosh-14)
((ACOSH 15.0497) ACOSH-P 3.4034))
((acosh-15)
((ACOSH 19.5448) ACOSH-P 3.6652))
((acosh-16)
((ACOSH 25.3871) ACOSH-P 3.9270))
((acosh-17)
((ACOSH 32.9794) ACOSH-P 4.1888))
((acosh-18)
((ACOSH 42.8450) ACOSH-P 4.4506))
((acosh-19)
((ACOSH 55.6640) ACOSH-P 4.7124))
((acosh-20)
((ACOSH 72.3200) ACOSH-P 4.9742))
((acosh-21)
((ACOSH 93.9611) ACOSH-P 5.2360)))
:setup (DEFUN ACOSH-P (ARG1 ARG2)
(PROG (RES)
(COND ((= ARG1 ARG2) (RETURN T))
((= ARG2 0.0) (RETURN (AND (> ARG1 -1E-9)
(< ARG1 1E-9))))
(T (SETQ RES (/ ARG1 ARG2))
(RETURN (AND (> RES 0.9999)
(< RES 1.0001)))))))
:unsetup (fmakunbound 'acosh-p)
:contrib$ "CDC. Test case written by BRANDON CROSS, SOFTWARE ARCHITECTURE AND ENGINEERING"
:doc$ nil)
∂25-Aug-86 1255 berman@vaxa.isi.edu Purpose
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 25 Aug 86 12:55:26 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA02551; Mon, 25 Aug 86 12:56:25 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608251956.AA02551@vaxa.isi.edu>
Date: 25 Aug 1986 1256-PDT (Monday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: Purpose
>From Fahlman I get that the purpose is basically to see that the spec (or
whatever) is checked, rather than a sweepingly deep test. Marick seems to
feel that checking "that a Common Lisp system adheres to the letter of the
specification" is not "particularly different from any test suite". Yet it
seems that a vendor's test suite (and I have reviewed about 6 major ones now)
is more designed towards the testing of both adherance to spec and specific
areas of interest/problems in that implementation.
Marick's comments re "stock values" seems somewhat useful. Certainly adding
zero and -1 is sufficient to test the handling of both zero and -1 for
addition. I don't then need to add -7 and 2 to test for correct handling of
negatives. Fahlman basically said that testing the functions for each of the
data types it should handle was important. I think that this (data type
handling) and boundary conditions pretty much sum up the nature of the
validation suite which therefore should:
1. Test for the presence of all Common Lisp pre-defined objects.
2. Test for correct definition by:
a. Testing for the data type (i.e. Function, Constant, etc.) of each
of these objects.
b. Evaluating constants and variables for correct value.
c. Applying functions/macros to a sufficienty broad range of
arguments so as to ascertain the functionality for each type of argument and
combination of types.
Also, a few interraction tests are in order. By this I mean the testing of
more complex forms, and I am thinking specifically of scoping.
Obviously this test suite will not cover in any way extensions made to the
language. I know that such things as error handling and object oriented
programming are being addressed, but so far these very important areas remain
undetermined. Should I also make this same data base (and its corresponding
test-file making utilities, etc.) available for this vendor-specific use? I
don't even know if I CAN do this without some kind of semi-legal hassle
because at present all contributions are public domain. But it would be nice
to have the same test format for everything.
As I must use FSD, I cannot easily give away the actual database stuff. So
far it is all in straight CL, but this is only because FSD is not yet running
on the TI explorer. This is imminent, but I will try (no promise) to keep
some kind of CL version of the database stuff around. If it gets too complex
(which is what FSD is good at handling) I may have to cease working on a
straigh CL version.
So.............
Comments???? Is this the correct statement of the purpose and direction I
should use in putting this thing together?
Thanks.
RB
∂27-Aug-86 0041 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU TEST MACRO
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86 00:41:02 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44090; Wed 27-Aug-86 03:43:31-EDT
Date: Wed, 27 Aug 86 03:42 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608251921.AA02269@vaxa.isi.edu>
Message-ID: <860827034240.5.CFRY@JONES.AI.MIT.EDU>
(defmacro add-test (item name type contrib$ setup testform unsetup
failform error$ doc$ name-add control)
(putprop name item 'test-of) ; note what it is a test of.
(putprop name type 'test-type)
(putprop name contrib$ 'test-contributor)
(putprop name setup 'test-setup)
(putprop name testform 'test-form)
(putprop name unsetup 'test-unsetup)
(putprop name failform 'test-failform)
(putprop name error$ 'test-error$)
(putprop name doc$ 'test-doc$)
Usually doc should default to something
(putprop name control 'test-control)
(and name-add
(putprop item (cons name (get item 'tests)) 'tests)
(push name *list-of-test-names*))
`',name)
; TESTFORM is the test form, composed of 1 or 3 parts. If this is
; and ERROR test, TESTFORM is an expresion which must produce
; an error. Otherwise there are 3 parts. The first is the
; eval form, which is evaluated. The second is a form which
; can be used as a function by APPLY, taking two arguments and
; used to compare the results of the eval form with the third
; part of the TESTFORM, the compare form.
I prefer lisp syntax. compare form first! Then test form, then expected result.
Make it look like a function call, ie a list of 3 elements.
Infix is good for mathematicians who don't understand elegant syntax.
The compare form is
; either evalutated (type EVAL) or not (type NOEVAL).
Always evaluate it. Specify no-eval by putting a quote in front of it!
[not necessary in case its a number, string, character, keyword, etc.]
; :FAILFORM is a form to evaluate in the event that an unexpected error
; was generated, or the comparison failed.
How about have the default prints to *error-output* a composed message like:
"In test FROBULATOR, (foo) should have returned 2 but returned 3 instead."
; :ERROR$ is a string to print out if the comparison fails.
Do we need both failform and error$ ?
If the test fails, evaluate the value of :failform, which prints out the standard message.
Its rare when you'd want to do something other than the default.
Maybe it would be good to have the default behavior come from
global var *test-fail-action*, so someone could generate their own
format of reporting bugs.
; :SETUP is a form to evaluate before TESTFORM.
; :UNSETUP is a form to evaluate after TESTFORM.
; :DOC$ is a string documenting this test. If not specified (or nil) it
; gets it value from the global variable CL-TESTS:*DOC$*
Which itself defaults to "" .
; :CONTROL may be any of :GLOBAL, :EVAL or :COMPILE. If it is :GLOBAL,
; it means that the test controller will decide when/if to eval and
; compile the test. If it is :EVAL, then the test will ignore
; controller attempts to compile it, and if it is :COMPILE the
; controller cannot eval it. The default is :GLOBAL.
Sounds good. Actually your names of :eval-only and :compile-only are
clearer, but just so long as we all agree upon the semantics.
; (DEFTEST-SEQ (item seq-name)
I'd hope most of the time to never have to see a call to
deftest-seq. Something should just go over a whole file
and make it one big call to deftest-seq.
But its nice to have for obscure cases and non-file modularity.
I notice some dollar sign suffixes in the code.
How about a DIAG package to avoid name conflicts?
Of course, the package system has to be working for you to run your
diagnostics, but ...
∂27-Aug-86 1211 berman@vaxa.isi.edu TEST MACRO - Fry's Comments
Received: from VAXA.ISI.EDU by SAIL.STANFORD.EDU with TCP; 27 Aug 86 12:11:01 PDT
Received: by vaxa.isi.edu (4.12/4.7)
id AA18922; Wed, 27 Aug 86 12:11:08 pdt
From: berman@vaxa.isi.edu (Richard Berman)
Message-Id: <8608271911.AA18922@vaxa.isi.edu>
Date: 27 Aug 1986 1211-PDT (Wednesday)
To: CL-Validation@su-ai.arpa
Cc:
Subject: TEST MACRO - Fry's Comments
doc$ DOES default to a global value.
As for TESTFORM -- Lisp syntax is already, with the following proviso: It must
be of the form (predicate arg1 arg2) where predicate is an object which can be
applied to arg1 and arg2. I.e. (not(eq arg1 arg2)) is no good. But (neq arg1
arg2) is ok. The exception (per the comments in my code) is an ERROR type of
test.
"Always evaluate it [the compare form]". I took my current default
from Marick (that is, the compare form is not evaluated unless you specify
EVAL) after looking over a lot of different companies' test suites. By FAR
the vast majority of tests were of the NOEVAL variety. This will almost
certainly stand as the default.
:FAILFORM is very different from ERROR$. Per my original posting regarding
the macro, FAILFORM is optional (and will be rarely used at this point). It
is to help analyze an error (or pattern of errors) further. It is used for
testing beyond the "first order", where "first order" means simple error
testing. For example, one may wish for a :FAILFORM to maintain a list of
tests that have failed for a later analysis. :ERROR$ is simply a message to
print out. Actually, it might be nice if :ERROR$ was a format string with
some kind of argument capability, but this may be dangerous in a testing
environment since FORMAT is such a hairy function.
I like the idea of a global default *TEST-FAIL-ACTION*. I would then add an
:IF-FAIL keyword. This is different from :FAILFORM in that :FAILFORM is sort
of an :AFTER mix-in for the standard test-fail-action (or maybe a :BEFORE???,
any preferance?) rather than a replacement for the standard fail action.
:IF-FAIL would therefore allow one to replace the standard test-fail action,
which :FAILFORM would be the "mix-in" to the fail action. This is a useful
separation, especially when prototyping tests where :FAILFORM may not change
at the same rate as :IF-FAIL. I hope this paragraph is clear. Whew.
Yeah, we'll go to :COMPILE-ONLY and :EVAL-ONLY, with the previously defined
semantics, ok?
As for DEFTEST-SEQ...it is very necessary, and came about as a direct result
of working with existing test suites. This is used when you have auxiliary
functions, macros, variables, etc., which must exist at the the time the
sequence of tests is run. It is not always used just for ordering tests. For
example, in the CDC suite they have a function for comparing two numbers
within a certain tolerance which is used as part of the test for #'+. All the
tests of #'+ use this as the predicate. So, all the #'+ tests are wrapped in
a DEFTEST-SEQ with the definition of this predicate in the :SETUP slot. In
this case, the actual temporal sequence of the tests is unimportant. Another
use for DEFTEST-SEQ is when the test sequence is itself important.
Don't forget that each test will become an object in a database, and an
extraction routine will build the files which you will then load as a test
suite. Thus with this paradigm, you MUST associate any auxilary environmental
factors as part of the relevant tests, otherwise there is no way at
file-building time to determine what predicates should be defined where.
As you said, "It's nice to have for...non-file modularity", which is exactly
the case.
As for dollar-sign suffixes -- that's a holdover from BASIC, and is short for
"string". It isn't an attempt to avoid name conflicts. HOWEVER...I have been
meaning to stick all this stuff in its own package anyway.
And, yeah, the package system has to be working, but...
Thanks a lot for your comments. To summarize, the things I agree with:
Prefix syntax for TESTFORM, with the mentioned proviso. Some kind of global
*TEST-FAIL-ACTION*. Using the names :EVAL-ONLY and :COMPILE-ONLY. A Package
for test stuff. I disargree with: Always evaluating the compare form. And,
lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
some misunderstanding of an earlier message.
Sha-Boom.
RB
∂28-Aug-86 1308 @REAGAN.AI.MIT.EDU:cfry@OZ.AI.MIT.EDU TEST MACRO - Fry's Comments
Received: from REAGAN.AI.MIT.EDU by SAIL.STANFORD.EDU with TCP; 28 Aug 86 13:08:00 PDT
Received: from JONES.AI.MIT.EDU by REAGAN.AI.MIT.EDU via CHAOS with CHAOS-MAIL id 44202; Thu 28-Aug-86 16:10:56-EDT
Date: Thu, 28 Aug 86 16:09 EDT
From: Christopher Fry <cfry@OZ.AI.MIT.EDU>
Subject: TEST MACRO - Fry's Comments
To: berman@vaxa.isi.edu, CL-Validation@SU-AI.ARPA
In-Reply-To: <8608271911.AA18922@vaxa.isi.edu>
Message-ID: <860828160939.2.CFRY@JONES.AI.MIT.EDU>
Thanks a lot for your comments. To summarize, the things I agree with:
Prefix syntax for TESTFORM, with the mentioned proviso. Some kind of global
*TEST-FAIL-ACTION*. Using the names :EVAL-ONLY and :COMPILE-ONLY. A Package
for test stuff.
Good.
I disargree with: Always evaluating the compare form. And,
lastly, your comments on :FAILFORM and :ERROR$, and DEFTEST-SEQ may be due to
some misunderstanding of an earlier message.
I think the real thrust of my arguments was just to try to cut down the number of
keyword args in this test macro, and thus make it easier to remember what's going on.
Always evaling the comparison form cuts out the :eval-compare-form, and
just having one action taken when a test fails cuts out one of
:failform or :error$. You'll be using the test stuff more than anyone so you
will have implimentors myopia disease which is:
"You can remember all this stuff because you work with it daily."
But you also have the insight from being most experienced with the problem
and have the distinct advantage of implementing the code.
Please consider us less-frequent users when you add a new and/or confusing
feature [where confusing means non-lisp like].
Fry